c_can driver sometimes sends first two bytes filled with zeros

All of lore.kernel.org
 help / color / mirror / Atom feed

* c_can driver sometimes sends first two bytes filled with zeros
@ 2016-05-12  9:23 Richard Andrysek
  2016-05-16 18:14 ` Thor Thayer
  2016-05-23 18:19 ` Wolfgang Grandegger
  0 siblings, 2 replies; 14+ messages in thread
From: Richard Andrysek @ 2016-05-12  9:23 UTC (permalink / raw)
  To: linux-can

We can reproduce an issue with the canutils. We send messages in the loop with non-zero bytes and from time to time we get first two bytes of the message with zero values. The test script looks so:

#!/bin/sh

echo "Press [CTRL+C] to stop.."
while true
do
               cansend can1 --loop=15 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
done

With CAN analyzer we see normally the right message, but in cycles ~1min we see first two bytes are zero.

If we add some delays between messages, like this:

do
               cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
               usleep 5
	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
               usleep 5
	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
               usleep 5
	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
               usleep 5
	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
               usleep 5
	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
               usleep 5
	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
               usleep 5
	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
               usleep 5
	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
               usleep 5
	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
               usleep 5
	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
               usleep 5

done

It works fine.

We use Altera Cyclone V, where the  c_can driver is used. It runs with Linux kernel 3.16, but I've checked 4.5 version of a driver and it is a same one.

Have somebody idea how to find a reason for that? 

Richard


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: c_can driver sometimes sends first two bytes filled with zeros
  2016-05-12  9:23 c_can driver sometimes sends first two bytes filled with zeros Richard Andrysek
@ 2016-05-16 18:14 ` Thor Thayer
  2016-05-17 17:18   ` AW: " Richard Andrysek
  2016-05-23 18:19 ` Wolfgang Grandegger
  1 sibling, 1 reply; 14+ messages in thread
From: Thor Thayer @ 2016-05-16 18:14 UTC (permalink / raw)
  To: Richard Andrysek, linux-can

Hi Richard,

On 05/12/2016 04:23 AM, Richard Andrysek wrote:
> We can reproduce an issue with the canutils. We send messages in the loop with non-zero bytes and from time to time we get first two bytes of the message with zero values. The test script looks so:
>
> #!/bin/sh
>
> echo "Press [CTRL+C] to stop.."
> while true
> do
>                 cansend can1 --loop=15 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
> done
>
> With CAN analyzer we see normally the right message, but in cycles ~1min we see first two bytes are zero.
>
> If we add some delays between messages, like this:
>
> do
>                 cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
>
> done
>
> It works fine.
>
> We use Altera Cyclone V, where the  c_can driver is used. It runs with Linux kernel 3.16, but I've checked 4.5 version of a driver and it is a same one.
>
> Have somebody idea how to find a reason for that?
>
> Richard
>
How old are your canutils? I had a similar issue on an older version of 
cansend (4.0.6 from the Pengutronix site - with a copyright date of 2009).

There is an old thread (http://comments.gmane.org/gmane.linux.can/2339) 
that suggests adding a delay in the cansend utility to work around a 
poll/select bug.

I added a delay for 4.0.6 but you may want to grab the latest from 
github (https://github.com/linux-can/can-utils).

> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* AW: c_can driver sometimes sends first two bytes filled with zeros
  2016-05-16 18:14 ` Thor Thayer
@ 2016-05-17 17:18   ` Richard Andrysek
  2016-05-18 15:35     ` Thor Thayer
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Andrysek @ 2016-05-17 17:18 UTC (permalink / raw)
  To: tthayer; +Cc: linux-can

Hi Thor,

I've made two use cases with can0:

a)	RAW socket can with 200us steps
b)	8 CAN messages with 150us step in a broadcast mode

Both works perfect, only my PCAN USB has from time to time an overrun. I think I need another logger:)

But if I take case b) and reduce steps to 128us I get two frames with wrong two first bytes per 3min.
Similar behavior in case a).

Questions:
   Q1) What I still don't understand, if there is a failure I do not get any error value. Why that?
   Q2) Is there somewhere missing some kind of locking mechanism?

-----Ursprüngliche Nachricht-----
Von: Thor Thayer [mailto:tthayer@opensource.altera.com] 
Gesendet: Montag, 16. Mai 2016 20:15
An: Richard Andrysek <richard.andrysek@gomtec.de>; linux-can@vger.kernel.org
Betreff: Re: c_can driver sometimes sends first two bytes filled with zeros

Hi Richard,

On 05/12/2016 04:23 AM, Richard Andrysek wrote:
> We can reproduce an issue with the canutils. We send messages in the loop with non-zero bytes and from time to time we get first two bytes of the message with zero values. The test script looks so:
>
> #!/bin/sh
>
> echo "Press [CTRL+C] to stop.."
> while true
> do
>                 cansend can1 --loop=15 -i 933 0xde 0xde 0xde 0xde 0xde 
> 0xde done
>
> With CAN analyzer we see normally the right message, but in cycles ~1min we see first two bytes are zero.
>
> If we add some delays between messages, like this:
>
> do
>                 cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
>
> done
>
> It works fine.
>
> We use Altera Cyclone V, where the  c_can driver is used. It runs with Linux kernel 3.16, but I've checked 4.5 version of a driver and it is a same one.
>
> Have somebody idea how to find a reason for that?
>
> Richard
>
How old are your canutils? I had a similar issue on an older version of cansend (4.0.6 from the Pengutronix site - with a copyright date of 2009).

There is an old thread (http://comments.gmane.org/gmane.linux.can/2339)
that suggests adding a delay in the cansend utility to work around a poll/select bug.

I added a delay for 4.0.6 but you may want to grab the latest from github (https://github.com/linux-can/can-utils).

> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: AW: c_can driver sometimes sends first two bytes filled with zeros
  2016-05-17 17:18   ` AW: " Richard Andrysek
@ 2016-05-18 15:35     ` Thor Thayer
       [not found]       ` <0120733A154AE74CA608A286CE7FFD2621D9CB60@rg-contact.RG.local>
  0 siblings, 1 reply; 14+ messages in thread
From: Thor Thayer @ 2016-05-18 15:35 UTC (permalink / raw)
  To: Richard Andrysek, linux-can

Hi Richard,

On 05/17/2016 12:18 PM, Richard Andrysek wrote:
> Hi Thor,
>
> I've made two use cases with can0:
>
> a)	RAW socket can with 200us steps
> b)	8 CAN messages with 150us step in a broadcast mode
>
> Both works perfect, only my PCAN USB has from time to time an overrun. I think I need another logger:)
>
> But if I take case b) and reduce steps to 128us I get two frames with wrong two first bytes per 3min.
> Similar behavior in case a).
>
I'd like to clarify. When you refer to steps, are you talking about usec 
delays between messages using the usleep function as described in your 
initial post?

And to clarify the setup: a) is a RAW socket (is this a custom user 
space program?) sending 1 message every 200us and b) uses the older 
4.0.6 cansend sending 1 message with 150us between each message repeated 
8 times? Or is b) sending 8 messages back-to-back (no usleep() in 
between messages) with 150us between the 8 messages?

And the problem is that when you reduce the delay between messages to 
128us, you start losing the first 2 bytes in 2 frames if you record for 
3 minutes.

> Questions:
>     Q1) What I still don't understand, if there is a failure I do not get any error value. Why that?
>     Q2) Is there somewhere missing some kind of locking mechanism?
>

> -----Ursprüngliche Nachricht-----
> Von: Thor Thayer [mailto:tthayer@opensource.altera.com]
> Gesendet: Montag, 16. Mai 2016 20:15
> An: Richard Andrysek <richard.andrysek@gomtec.de>; linux-can@vger.kernel.org
> Betreff: Re: c_can driver sometimes sends first two bytes filled with zeros
>
> Hi Richard,
>
> On 05/12/2016 04:23 AM, Richard Andrysek wrote:
>> We can reproduce an issue with the canutils. We send messages in the loop with non-zero bytes and from time to time we get first two bytes of the message with zero values. The test script looks so:
>>
>> #!/bin/sh
>>
>> echo "Press [CTRL+C] to stop.."
>> while true
>> do
>>                  cansend can1 --loop=15 -i 933 0xde 0xde 0xde 0xde 0xde
>> 0xde done
>>
>> With CAN analyzer we see normally the right message, but in cycles ~1min we see first two bytes are zero.
>>
>> If we add some delays between messages, like this:
>>
>> do
>>                  cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>>                  usleep 5
>> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>>                  usleep 5
>> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>>                  usleep 5
>> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>>                  usleep 5
>> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>>                  usleep 5
>> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>>                  usleep 5
>> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>>                  usleep 5
>> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>>                  usleep 5
>> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>>                  usleep 5
>> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>>                  usleep 5
>> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>>                  usleep 5
>>
>> done
>>
>> It works fine.
>>
>> We use Altera Cyclone V, where the  c_can driver is used. It runs with Linux kernel 3.16, but I've checked 4.5 version of a driver and it is a same one.
>>
>> Have somebody idea how to find a reason for that?
>>
>> Richard
>>
> How old are your canutils? I had a similar issue on an older version of cansend (4.0.6 from the Pengutronix site - with a copyright date of 2009).
>
> There is an old thread (http://comments.gmane.org/gmane.linux.can/2339)
> that suggests adding a delay in the cansend utility to work around a poll/select bug.
>
> I added a delay for 4.0.6 but you may want to grab the latest from github (https://github.com/linux-can/can-utils).
>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-can"
>> in the body of a message to majordomo@vger.kernel.org More majordomo
>> info at  http://vger.kernel.org/majordomo-info.html
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: AW: AW: c_can driver sometimes sends first two bytes filled with zeros
       [not found]       ` <0120733A154AE74CA608A286CE7FFD2621D9CB60@rg-contact.RG.local>
@ 2016-05-19 23:00         ` Thor Thayer
  2016-05-20 12:01           ` AW: " Richard Andrysek
  0 siblings, 1 reply; 14+ messages in thread
From: Thor Thayer @ 2016-05-19 23:00 UTC (permalink / raw)
  To: Richard Andrysek, linux-can; +Cc: wg, mkl

Hi Richard,

On 05/19/2016 09:07 AM, Richard Andrysek wrote:
> Hi Thor,
>
> I am sending my test program and a makefile. The application accept up to 2 arguments:
> ## send 8 messages in 2ms cycle
> # test  8 2000
>
> ## default 16 messages in 2.6ms (the zip file data)
> # test
>
> In the zip file you can see my logging files (channel 1 and channel 2), on the channel 2 you can find failures
> in the "Failure_dump.txt".
>
Yes, I can see the zero bytes in your data.

Am I correct that both can0 and can1 are receiving the same data? If I'm 
understanding, can0 seems to be fine but can1 periodically shows 
corruption with the first 2 bytes set to 0?

Since the attached program uses a write, I'm confused about who is 
sending the data? Is that the PCAN USB you referred to in an earlier email?

Also, where is the 2.6ms that you refer to and is in the code? From the 
data, it seems like the spacing between frames is ~.120ms and ~1msec 
between 16 frame bursts.

> Initialization of can channels is done from the shell:
>
> # ip link set can0 up type can bitrate 1000000
> # ip link set can1 up type can bitrate 1000000
>
> The cpu load is max 5%.
>
> Question1:
> I've checked a driver it looks good. Except one line in the function :
>
> static netdev_tx_t c_can_start_xmit(struct sk_buff *skb,
> 				    struct net_device *dev)
> {
> 	struct can_frame *frame = (struct can_frame *)skb->data;
> 	struct c_can_priv *priv = netdev_priv(dev);
> 	u32 idx, obj;
>
> 	if (can_dropped_invalid_skb(dev, skb))
> 		return NETDEV_TX_OK;
> 	/*
> 	 * This is not a FIFO. C/D_CAN sends out the buffers
> 	 * prioritized. The lowest buffer number wins.
> 	 */
> 	idx = fls(atomic_read(&priv->tx_active));  /*!!!!! Why so !!!! */
> 	...
> 	...
> 	/* Update the active bits */
> 	atomic_add((1 << idx), &priv->tx_active);
>
> Are these atomic operations correct? atomic_add I understand, but shall not be atomic_read used like this:
>
>               Atomic block start
>               idx = fls(read(&priv->tx->active))
>               remove_idx_from(&priv->tx->active, idx)
>               Atomic block stop	
>
I'm not sure about this. I'm including the maintainers in the CC.

> Question2: What can make corrupted CAN messages on the channel 2?
>
Hmm. I've only worked with the first CAN so I can't speak for the 2nd 
can device.

> Ch.
>
> Richard
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* AW: AW: AW: c_can driver sometimes sends first two bytes filled with zeros
  2016-05-19 23:00         ` AW: " Thor Thayer
@ 2016-05-20 12:01           ` Richard Andrysek
  2016-05-23 14:22             ` Thor Thayer
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Andrysek @ 2016-05-20 12:01 UTC (permalink / raw)
  To: tthayer, linux-can; +Cc: wg, mkl

Thanks for informing maintainers. If needed we can Skype or phone or ... 

Gateway devices can0 and can1 send CAN messages. I have new logger "PCAN-USB-Pro"(I've tested also with a CANCaseXL) with two channels which only receives messages. One logger's channel is connected with can0 and one with can1. On one channel I do not see any failure (healthy channel), on the second one from time to time (bad channel).

The physical transmission of one CAN message on the bus takes 108+bit stuffing time us = ~118us. As you can see in the c-code, in the while-loop, first are written 16 messages ("burst") into each peripherals (can0,can1) and after that it waits 2600us for a next cycle. The peripheral itself has message buffers (small extra RAM), in the Linux driver are programmed 16 buffers for TX. So I can write 16 messages into buffers and then give enough time to be physically transmitted on the bus - 2600us (variable "delayTime_us".   

Real one cycle takes 2822 because:  "time of write calls" + "delayTime_us" => it seems that "time of 32 write calls" is roughly ~222us.

One additional test: if I physically disconnect a "healthy channel", which means physically it does not send any message, the "bad channel" still have a problem.

Q: What I see in the code that there is writing into peripheral in 16bit steps, could it be related with that?
	File: "c_can.c"
	Function : c_can_setup_tx_object
	...	
	for (i = 0; i < frame->can_dlc; i += 2) {
		priv->write_reg(priv, C_CAN_IFACE(DATA1_REG, iface) + i / 2,
				frame->data[i] | (frame->data[i + 1] << 8));
	} 
	...
   Q2: Is it safe to use u8 like this: "(frame->data[i + 1] << 8)"?

Ch.

Richard

-----Ursprüngliche Nachricht-----
Von: Thor Thayer [mailto:tthayer@opensource.altera.com] 
Gesendet: Freitag, 20. Mai 2016 01:01
An: Richard Andrysek <richard.andrysek@gomtec.de>; linux-can@vger.kernel.org
Cc: wg@grandegger.com; mkl@pengutronix.de
Betreff: Re: AW: AW: c_can driver sometimes sends first two bytes filled with zeros

Hi Richard,

On 05/19/2016 09:07 AM, Richard Andrysek wrote:
> Hi Thor,
>
> I am sending my test program and a makefile. The application accept up to 2 arguments:
> ## send 8 messages in 2ms cycle
> # test  8 2000
>
> ## default 16 messages in 2.6ms (the zip file data) # test
>
> In the zip file you can see my logging files (channel 1 and channel 
> 2), on the channel 2 you can find failures in the "Failure_dump.txt".
>
Yes, I can see the zero bytes in your data.

Am I correct that both can0 and can1 are receiving the same data? If I'm understanding, can0 seems to be fine but can1 periodically shows corruption with the first 2 bytes set to 0?

Since the attached program uses a write, I'm confused about who is sending the data? Is that the PCAN USB you referred to in an earlier email?

Also, where is the 2.6ms that you refer to and is in the code? From the data, it seems like the spacing between frames is ~.120ms and ~1msec between 16 frame bursts.

> Initialization of can channels is done from the shell:
>
> # ip link set can0 up type can bitrate 1000000 # ip link set can1 up 
> type can bitrate 1000000
>
> The cpu load is max 5%.
>
> Question1:
> I've checked a driver it looks good. Except one line in the function :
>
> static netdev_tx_t c_can_start_xmit(struct sk_buff *skb,
> 				    struct net_device *dev)
> {
> 	struct can_frame *frame = (struct can_frame *)skb->data;
> 	struct c_can_priv *priv = netdev_priv(dev);
> 	u32 idx, obj;
>
> 	if (can_dropped_invalid_skb(dev, skb))
> 		return NETDEV_TX_OK;
> 	/*
> 	 * This is not a FIFO. C/D_CAN sends out the buffers
> 	 * prioritized. The lowest buffer number wins.
> 	 */
> 	idx = fls(atomic_read(&priv->tx_active));  /*!!!!! Why so !!!! */
> 	...
> 	...
> 	/* Update the active bits */
> 	atomic_add((1 << idx), &priv->tx_active);
>
> Are these atomic operations correct? atomic_add I understand, but shall not be atomic_read used like this:
>
>               Atomic block start
>               idx = fls(read(&priv->tx->active))
>               remove_idx_from(&priv->tx->active, idx)
>               Atomic block stop	
>
I'm not sure about this. I'm including the maintainers in the CC.

> Question2: What can make corrupted CAN messages on the channel 2?
>
Hmm. I've only worked with the first CAN so I can't speak for the 2nd can device.

> Ch.
>
> Richard
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: AW: AW: AW: c_can driver sometimes sends first two bytes filled with zeros
  2016-05-20 12:01           ` AW: " Richard Andrysek
@ 2016-05-23 14:22             ` Thor Thayer
  0 siblings, 0 replies; 14+ messages in thread
From: Thor Thayer @ 2016-05-23 14:22 UTC (permalink / raw)
  To: Richard Andrysek, linux-can; +Cc: wg, mkl

Thanks for the explanation of your setup.

On 05/20/2016 07:01 AM, Richard Andrysek wrote:
> Thanks for informing maintainers. If needed we can Skype or phone or ...
>
> Gateway devices can0 and can1 send CAN messages. I have new logger "PCAN-USB-Pro"(I've tested also with a CANCaseXL) with two channels which only receives messages. One logger's channel is connected with can0 and one with can1. On one channel I do not see any failure (healthy channel), on the second one from time to time (bad channel).
>
> The physical transmission of one CAN message on the bus takes 108+bit stuffing time us = ~118us. As you can see in the c-code, in the while-loop, first are written 16 messages ("burst") into each peripherals (can0,can1) and after that it waits 2600us for a next cycle. The peripheral itself has message buffers (small extra RAM), in the Linux driver are programmed 16 buffers for TX. So I can write 16 messages into buffers and then give enough time to be physically transmitted on the bus - 2600us (variable "delayTime_us".
>
> Real one cycle takes 2822 because:  "time of write calls" + "delayTime_us" => it seems that "time of 32 write calls" is roughly ~222us.
>
> One additional test: if I physically disconnect a "healthy channel", which means physically it does not send any message, the "bad channel" still have a problem.
>
That is interesting.

> Q: What I see in the code that there is writing into peripheral in 16bit steps, could it be related with that?
> 	File: "c_can.c"
> 	Function : c_can_setup_tx_object
> 	...	
> 	for (i = 0; i < frame->can_dlc; i += 2) {
> 		priv->write_reg(priv, C_CAN_IFACE(DATA1_REG, iface) + i / 2,
> 				frame->data[i] | (frame->data[i + 1] << 8));
> 	}
> 	...
>     Q2: Is it safe to use u8 like this: "(frame->data[i + 1] << 8)"?
>
Good point. According to the C spec, if an int can represent all values 
of the original type, the value is converted to an int.

> Ch.
>
> Richard
>
> -----Ursprüngliche Nachricht-----
> Von: Thor Thayer [mailto:tthayer@opensource.altera.com]
> Gesendet: Freitag, 20. Mai 2016 01:01
> An: Richard Andrysek <richard.andrysek@gomtec.de>; linux-can@vger.kernel.org
> Cc: wg@grandegger.com; mkl@pengutronix.de
> Betreff: Re: AW: AW: c_can driver sometimes sends first two bytes filled with zeros
>
> Hi Richard,
>
> On 05/19/2016 09:07 AM, Richard Andrysek wrote:
>> Hi Thor,
>>
>> I am sending my test program and a makefile. The application accept up to 2 arguments:
>> ## send 8 messages in 2ms cycle
>> # test  8 2000
>>
>> ## default 16 messages in 2.6ms (the zip file data) # test
>>
>> In the zip file you can see my logging files (channel 1 and channel
>> 2), on the channel 2 you can find failures in the "Failure_dump.txt".
>>
> Yes, I can see the zero bytes in your data.
>
> Am I correct that both can0 and can1 are receiving the same data? If I'm understanding, can0 seems to be fine but can1 periodically shows corruption with the first 2 bytes set to 0?
>
> Since the attached program uses a write, I'm confused about who is sending the data? Is that the PCAN USB you referred to in an earlier email?
>
> Also, where is the 2.6ms that you refer to and is in the code? From the data, it seems like the spacing between frames is ~.120ms and ~1msec between 16 frame bursts.
>
>> Initialization of can channels is done from the shell:
>>
>> # ip link set can0 up type can bitrate 1000000 # ip link set can1 up
>> type can bitrate 1000000
>>
>> The cpu load is max 5%.
>>
>> Question1:
>> I've checked a driver it looks good. Except one line in the function :
>>
>> static netdev_tx_t c_can_start_xmit(struct sk_buff *skb,
>> 				    struct net_device *dev)
>> {
>> 	struct can_frame *frame = (struct can_frame *)skb->data;
>> 	struct c_can_priv *priv = netdev_priv(dev);
>> 	u32 idx, obj;
>>
>> 	if (can_dropped_invalid_skb(dev, skb))
>> 		return NETDEV_TX_OK;
>> 	/*
>> 	 * This is not a FIFO. C/D_CAN sends out the buffers
>> 	 * prioritized. The lowest buffer number wins.
>> 	 */
>> 	idx = fls(atomic_read(&priv->tx_active));  /*!!!!! Why so !!!! */
>> 	...
>> 	...
>> 	/* Update the active bits */
>> 	atomic_add((1 << idx), &priv->tx_active);
>>
>> Are these atomic operations correct? atomic_add I understand, but shall not be atomic_read used like this:
>>
>>                Atomic block start
>>                idx = fls(read(&priv->tx->active))
>>                remove_idx_from(&priv->tx->active, idx)
>>                Atomic block stop	
>>
> I'm not sure about this. I'm including the maintainers in the CC.
>
>> Question2: What can make corrupted CAN messages on the channel 2?
>>
> Hmm. I've only worked with the first CAN so I can't speak for the 2nd can device.
>
>> Ch.
>>
>> Richard
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: c_can driver sometimes sends first two bytes filled with zeros
  2016-05-12  9:23 c_can driver sometimes sends first two bytes filled with zeros Richard Andrysek
  2016-05-16 18:14 ` Thor Thayer
@ 2016-05-23 18:19 ` Wolfgang Grandegger
  2016-06-01  9:40   ` AW: " Richard Andrysek
  1 sibling, 1 reply; 14+ messages in thread
From: Wolfgang Grandegger @ 2016-05-23 18:19 UTC (permalink / raw)
  To: Richard Andrysek, linux-can

Hello,

Am 12.05.2016 um 11:23 schrieb Richard Andrysek:
> We can reproduce an issue with the canutils. We send messages in the loop with non-zero bytes and from time to time we get first two bytes of the message with zero values. The test script looks so:
>
> #!/bin/sh
>
> echo "Press [CTRL+C] to stop.."
> while true
> do
>                 cansend can1 --loop=15 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
> done
>
> With CAN analyzer we see normally the right message, but in cycles ~1min we see first two bytes are zero.
>
> If we add some delays between messages, like this:
>
> do
>                 cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
>
> done
>
> It works fine.
>
> We use Altera Cyclone V, where the  c_can driver is used. It runs with Linux kernel 3.16, but I've checked 4.5 version of a driver and it is a same one.
>
> Have somebody idea how to find a reason for that?

The official "cansend" does not support "--loop". The official canutils
have cangen and canfdtest for more thorough testing. Anyway, does
"ip -s -d link show" report any errors? And could you run
"candump any,0:0,#FFFFFFFF" while sending. Does it also list the 
messages with the wrong data?

Wolfgang.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* AW: c_can driver sometimes sends first two bytes filled with zeros
  2016-05-23 18:19 ` Wolfgang Grandegger
@ 2016-06-01  9:40   ` Richard Andrysek
  2016-06-01 13:09     ` Wolfgang Grandegger
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Andrysek @ 2016-06-01  9:40 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: linux-can, Thor Thayer, mkl

Hi Wolfgang,

Sorry for one week break. Good idea with dumps, see answers below.

>> The official "cansend" does not support "--loop".
I've made the extra test program, please see my email from " 20 May 14:01 2016	Richard Andrysek".
With this test I've made following steps.

>>ip  -s -d link show
After 1 hour I've called this command
# ip -s -d link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0
    RX: bytes  packets  errors  dropped overrun mcast
    0          0        0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    0          0        0       0       0       0
2: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 10
    link/can  promiscuity 0
    can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
          bitrate 1000000 sample-point 0.750
          tq 50 prop-seg 7 phase-seg1 7 phase-seg2 5 sjw 1
          c_can: tseg1 2..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
          clock 100000000
          re-started bus-errors arbit-lost error-warn error-pass bus-off
          0          0          0          0          0          0
    RX: bytes  packets  errors  dropped overrun mcast
    0          0        0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    85337216   10667152 0       0       0       0
3: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 10
    link/can  promiscuity 0
    can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
          bitrate 1000000 sample-point 0.750
          tq 50 prop-seg 7 phase-seg1 7 phase-seg2 5 sjw 1
          c_can: tseg1 2..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
          clock 100000000
          re-started bus-errors arbit-lost error-warn error-pass bus-off
          0          0          0          0          0          0
    RX: bytes  packets  errors  dropped overrun mcast
    0          0        0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    85337216   10667152 0       0       0       0

>> run "candump any,0:0,#FFFFFFFF"
I can run following commands, I let to run for 5min, 10 min, and more:
  # candump can0 | grep -v "01 02 03 04"
  interface = can0, family = 29, type = 3, proto = 1
   Or
 # candump can1 | grep -v "01 02 03 04"
 interface = can1, family = 29, type = 3, proto = 1 

Nothing except a first line, which is OK.

Summary:
    If I understand it correctly socketcan is OK. On the bus I have logging without any error, so from Altera to PEAK it is also OK. It is somewhere between "socketcan layer" and peripheral out buffer. What can I do now? E.g. shall I try to make the  atomic call over " priv->write_reg(...)", if yes how?

CH.

Richard
    

-----Ursprüngliche Nachricht-----
Von: Wolfgang Grandegger [mailto:wg@grandegger.com] 
Gesendet: Montag, 23. Mai 2016 20:20
An: Richard Andrysek <richard.andrysek@gomtec.de>; linux-can@vger.kernel.org
Betreff: Re: c_can driver sometimes sends first two bytes filled with zeros

Hello,

Am 12.05.2016 um 11:23 schrieb Richard Andrysek:
> We can reproduce an issue with the canutils. We send messages in the loop with non-zero bytes and from time to time we get first two bytes of the message with zero values. The test script looks so:
>
> #!/bin/sh
>
> echo "Press [CTRL+C] to stop.."
> while true
> do
>                 cansend can1 --loop=15 -i 933 0xde 0xde 0xde 0xde 0xde 
> 0xde done
>
> With CAN analyzer we see normally the right message, but in cycles ~1min we see first two bytes are zero.
>
> If we add some delays between messages, like this:
>
> do
>                 cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>                 usleep 5
>
> done
>
> It works fine.
>
> We use Altera Cyclone V, where the  c_can driver is used. It runs with Linux kernel 3.16, but I've checked 4.5 version of a driver and it is a same one.
>
> Have somebody idea how to find a reason for that?

The official "cansend" does not support "--loop". The official canutils have cangen and canfdtest for more thorough testing. Anyway, does "ip -s -d link show" report any errors? And could you run "candump any,0:0,#FFFFFFFF" while sending. Does it also list the messages with the wrong data?

Wolfgang.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: c_can driver sometimes sends first two bytes filled with zeros
  2016-06-01  9:40   ` AW: " Richard Andrysek
@ 2016-06-01 13:09     ` Wolfgang Grandegger
  2016-06-10 10:49       ` Andy Haydon
  0 siblings, 1 reply; 14+ messages in thread
From: Wolfgang Grandegger @ 2016-06-01 13:09 UTC (permalink / raw)
  To: Richard Andrysek; +Cc: linux-can, Thor Thayer, mkl

Hello Richard,

Am 01.06.2016 um 11:40 schrieb Richard Andrysek:
> Hi Wolfgang,
>
> Sorry for one week break. Good idea with dumps, see answers below.
>
>>> The official "cansend" does not support "--loop".
> I've made the extra test program, please see my email from " 20 May 14:01 2016	Richard Andrysek".
> With this test I've made following steps.
>
>>> ip  -s -d link show
> After 1 hour I've called this command
> # ip -s -d link show
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default
>      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0
>      RX: bytes  packets  errors  dropped overrun mcast
>      0          0        0       0       0       0
>      TX: bytes  packets  errors  dropped carrier collsns
>      0          0        0       0       0       0
> 2: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 10
>      link/can  promiscuity 0
>      can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
>            bitrate 1000000 sample-point 0.750

>            tq 50 prop-seg 7 phase-seg1 7 phase-seg2 5 sjw 1
>            c_can: tseg1 2..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
>            clock 100000000
>            re-started bus-errors arbit-lost error-warn error-pass bus-off
>            0          0          0          0          0          0
>      RX: bytes  packets  errors  dropped overrun mcast
>      0          0        0       0       0       0
>      TX: bytes  packets  errors  dropped carrier collsns
>      85337216   10667152 0       0       0       0
> 3: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 10
>      link/can  promiscuity 0
>      can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
>            bitrate 1000000 sample-point 0.750
>            tq 50 prop-seg 7 phase-seg1 7 phase-seg2 5 sjw 1
>            c_can: tseg1 2..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
>            clock 100000000
>            re-started bus-errors arbit-lost error-warn error-pass bus-off
>            0          0          0          0          0          0
>      RX: bytes  packets  errors  dropped overrun mcast
>      0          0        0       0       0       0
>      TX: bytes  packets  errors  dropped carrier collsns
>      85337216   10667152 0       0       0       0

OK, no errors reported.

>>> run "candump any,0:0,#FFFFFFFF"
> I can run following commands, I let to run for 5min, 10 min, and more:
>    # candump can0 | grep -v "01 02 03 04"
>    interface = can0, family = 29, type = 3, proto = 1
>     Or
>   # candump can1 | grep -v "01 02 03 04"
>   interface = can1, family = 29, type = 3, proto = 1
>
> Nothing except a first line, which is OK.

Ok, anyway, please use the official CAN utilities [1] sooner than later.


> Summary:
>      If I understand it correctly socketcan is OK. On the bus I have logging without any error, so from Altera to PEAK it is also OK. It is somewhere between "socketcan layer" and peripheral out buffer. What can I do now? E.g. shall I try to make the  atomic call over " priv->write_reg(...)", if yes how?

Making "priv->write_regs() atomic is something I would try as well. My 
suspicion is that something goes wrong writing(/reading) the controller 
registers.

[1] https://github.com/linux-can/can-utils

Wolfgang.

> CH.
>
> Richard
>
>
> -----Ursprüngliche Nachricht-----
> Von: Wolfgang Grandegger [mailto:wg@grandegger.com]
> Gesendet: Montag, 23. Mai 2016 20:20
> An: Richard Andrysek <richard.andrysek@gomtec.de>; linux-can@vger.kernel.org
> Betreff: Re: c_can driver sometimes sends first two bytes filled with zeros
>
> Hello,
>
> Am 12.05.2016 um 11:23 schrieb Richard Andrysek:
>> We can reproduce an issue with the canutils. We send messages in the loop with non-zero bytes and from time to time we get first two bytes of the message with zero values. The test script looks so:
>>
>> #!/bin/sh
>>
>> echo "Press [CTRL+C] to stop.."
>> while true
>> do
>>                  cansend can1 --loop=15 -i 933 0xde 0xde 0xde 0xde 0xde
>> 0xde done
>>
>> With CAN analyzer we see normally the right message, but in cycles ~1min we see first two bytes are zero.
>>
>> If we add some delays between messages, like this:
>>
>> do
>>                  cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>>                  usleep 5
>> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>>                  usleep 5
>> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>>                  usleep 5
>> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>>                  usleep 5
>> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>>                  usleep 5
>> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>>                  usleep 5
>> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>>                  usleep 5
>> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>>                  usleep 5
>> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>>                  usleep 5
>> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>>                  usleep 5
>> 	cansend can1 --loop=1 -i 933 0xde 0xde 0xde 0xde 0xde 0xde
>>                  usleep 5
>>
>> done
>>
>> It works fine.
>>
>> We use Altera Cyclone V, where the  c_can driver is used. It runs with Linux kernel 3.16, but I've checked 4.5 version of a driver and it is a same one.
>>
>> Have somebody idea how to find a reason for that?
>
> The official "cansend" does not support "--loop". The official canutils have cangen and canfdtest for more thorough testing. Anyway, does "ip -s -d link show" report any errors? And could you run "candump any,0:0,#FFFFFFFF" while sending. Does it also list the messages with the wrong data?
>
> Wolfgang.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: c_can driver sometimes sends first two bytes filled with zeros
  2016-06-01 13:09     ` Wolfgang Grandegger
@ 2016-06-10 10:49       ` Andy Haydon
  2016-06-10 12:55         ` Wolfgang Grandegger
  0 siblings, 1 reply; 14+ messages in thread
From: Andy Haydon @ 2016-06-10 10:49 UTC (permalink / raw)
  To: linux-can

Hi Wolfgang,

Wolfgang Grandegger <wg <at> grandegger.com> writes:

> 
> Making "priv->write_regs() atomic is something I would try as well. My 
> suspicion is that something goes wrong writing(/reading) the controller 
> registers.

We have also been looking at a similar issue over the last few days. In our 
case, bytes 0 and 1 and bytes 4 and 5 of an 8 byte message can be corrupted 
randomly - not necessarily zeroed. Bytes 2 and 3 and bytes 6 and 7 are never
corrupted. The Tx task is running as a high priority real time thread in 
PREEMPT_RT.

I've found that so far the only thing that fixes this issue is to perform 
the writes to the data registers as 32 bit operations something like this 
(for 3.18.20):

--- a/drivers/net/can/c_can/c_can.c	2015-08-07 20:08:04.000000000 +0100
+++ b/drivers/net/can/c_can/c_can.c	2016-06-09 14:46:51.497895357 +0100
@@ -331,9 +332,19 @@

 	priv->write_reg(priv, C_CAN_IFACE(MSGCTRL_REG, iface), ctrl);

-	for (i = 0; i < frame->can_dlc; i += 2) {
-		priv->write_reg(priv, C_CAN_IFACE(DATA1_REG, iface) + i / 2,
-				frame->data[i] | (frame->data[i + 1] << 8));
+// this doesn't work for Cyclone V
+//	for (i = 0; i < frame->can_dlc; i += 2) {
+//		priv->write_reg(priv, C_CAN_IFACE(DATA1_REG, iface) + i / 2,
+//				frame->data[i] | (frame->data[i + 1] << 8));
+
+// replaced with this as a workaround
+	u32 data = 0;
+	for (i = 0; i < frame->can_dlc; i += 4) {
+		data = (u32)frame->data[i];
+		data |= (u32)frame->data[i + 1] << 8;
+		data |= (u32)frame->data[i + 2] << 16;
+		data |= (u32)frame->data[i + 3] << 24;
+		priv->write_reg32(priv, C_CAN_IFACE(DATA1_REG, iface) + i / 
2, data);
 	}
 }

At the moment I can't explain this behaviour, or if this is really the only 
way to fix the issue for the Cyclone V. 
The tests that I've done so far seem to indicate that the data is correct 
in the driver right up to the point at which it is actually written to the 
CAN peripheral registers, so I think that your suspicion is correct.

Regards,
Andy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: c_can driver sometimes sends first two bytes filled with zeros
  2016-06-10 10:49       ` Andy Haydon
@ 2016-06-10 12:55         ` Wolfgang Grandegger
  2016-06-10 13:12           ` Andy Haydon
  0 siblings, 1 reply; 14+ messages in thread
From: Wolfgang Grandegger @ 2016-06-10 12:55 UTC (permalink / raw)
  To: Andy Haydon, linux-can; +Cc: Richard Andrysek

Hello Andy,

Am 10.06.2016 um 12:49 schrieb Andy Haydon:
> Hi Wolfgang,
>
> Wolfgang Grandegger <wg <at> grandegger.com> writes:
>
>>
>> Making "priv->write_regs() atomic is something I would try as well. My
>> suspicion is that something goes wrong writing(/reading) the controller
>> registers.
>
> We have also been looking at a similar issue over the last few days. In our
> case, bytes 0 and 1 and bytes 4 and 5 of an 8 byte message can be corrupted
> randomly - not necessarily zeroed. Bytes 2 and 3 and bytes 6 and 7 are never
> corrupted. The Tx task is running as a high priority real time thread in
> PREEMPT_RT.
>
> I've found that so far the only thing that fixes this issue is to perform
> the writes to the data registers as 32 bit operations something like this
> (for 3.18.20):
>
> --- a/drivers/net/can/c_can/c_can.c	2015-08-07 20:08:04.000000000 +0100
> +++ b/drivers/net/can/c_can/c_can.c	2016-06-09 14:46:51.497895357 +0100
> @@ -331,9 +332,19 @@
>
>   	priv->write_reg(priv, C_CAN_IFACE(MSGCTRL_REG, iface), ctrl);
>
> -	for (i = 0; i < frame->can_dlc; i += 2) {
> -		priv->write_reg(priv, C_CAN_IFACE(DATA1_REG, iface) + i / 2,
> -				frame->data[i] | (frame->data[i + 1] << 8));
> +// this doesn't work for Cyclone V
> +//	for (i = 0; i < frame->can_dlc; i += 2) {
> +//		priv->write_reg(priv, C_CAN_IFACE(DATA1_REG, iface) + i / 2,
> +//				frame->data[i] | (frame->data[i + 1] << 8));
> +
> +// replaced with this as a workaround
> +	u32 data = 0;
> +	for (i = 0; i < frame->can_dlc; i += 4) {
> +		data = (u32)frame->data[i];
> +		data |= (u32)frame->data[i + 1] << 8;
> +		data |= (u32)frame->data[i + 2] << 16;
> +		data |= (u32)frame->data[i + 3] << 24;
> +		priv->write_reg32(priv, C_CAN_IFACE(DATA1_REG, iface) + i /
> 2, data);
>   	}
>   }

Interesting observation! Richard (on CC now), does the patch also fix 
your similar issues?

> At the moment I can't explain this behaviour, or if this is really the only
> way to fix the issue for the Cyclone V.

Maybe the hardware does not like the 16-bit accesses. BTW: How is the 
device configured? C_CAN with 32-bit accesses or D_CAN? Anyway, real 
32-bit accesses would be faster anyway.

> The tests that I've done so far seem to indicate that the data is correct
> in the driver right up to the point at which it is actually written to the
> CAN peripheral registers, so I think that your suspicion is correct.

Maybe some hardware guys from Altera could help.

Wolfgang.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: c_can driver sometimes sends first two bytes filled with zeros
  2016-06-10 12:55         ` Wolfgang Grandegger
@ 2016-06-10 13:12           ` Andy Haydon
  2016-06-10 13:36             ` Wolfgang Grandegger
  0 siblings, 1 reply; 14+ messages in thread
From: Andy Haydon @ 2016-06-10 13:12 UTC (permalink / raw)
  To: linux-can

Hi Wolfgang,

Wolfgang Grandegger <wg <at> grandegger.com> writes:

> 
> Hello Andy,
> 
> Am 10.06.2016 um 12:49 schrieb Andy Haydon:
> > Hi Wolfgang,
> >
> > Wolfgang Grandegger <wg <at> grandegger.com> writes:
> >
> >>
> >> Making "priv->write_regs() atomic is something I would try as well. My
> >> suspicion is that something goes wrong writing(/reading) the controller
> >> registers.
> >
> > We have also been looking at a similar issue over the last few days. In 
our
> > case, bytes 0 and 1 and bytes 4 and 5 of an 8 byte message can be 
corrupted
> > randomly - not necessarily zeroed. Bytes 2 and 3 and bytes 6 and 7 are 
never
> > corrupted. The Tx task is running as a high priority real time thread in
> > PREEMPT_RT.
> >
> > I've found that so far the only thing that fixes this issue is to 
perform
> > the writes to the data registers as 32 bit operations something like 
this
> > (for 3.18.20):
> >
> > --- a/drivers/net/can/c_can/c_can.c	2015-08-07 20:08:04.000000000 +0100
> > +++ b/drivers/net/can/c_can/c_can.c	2016-06-09 14:46:51.497895357 +0100
> >  <at>  <at>  -331,9 +332,19  <at>  <at> 
> >
> >   	priv->write_reg(priv, C_CAN_IFACE(MSGCTRL_REG, iface), ctrl);
> >
> > -	for (i = 0; i < frame->can_dlc; i += 2) {
> > -		priv->write_reg(priv, C_CAN_IFACE(DATA1_REG, iface) + i / 2,
> > -				frame->data[i] | (frame->data[i + 1] << 8));
> > +// this doesn't work for Cyclone V
> > +//	for (i = 0; i < frame->can_dlc; i += 2) {
> > +//		priv->write_reg(priv, C_CAN_IFACE(DATA1_REG, iface) + i / 2,
> > +//				frame->data[i] | (frame->data[i + 1] << 8));
> > +
> > +// replaced with this as a workaround
> > +	u32 data = 0;
> > +	for (i = 0; i < frame->can_dlc; i += 4) {
> > +		data = (u32)frame->data[i];
> > +		data |= (u32)frame->data[i + 1] << 8;
> > +		data |= (u32)frame->data[i + 2] << 16;
> > +		data |= (u32)frame->data[i + 3] << 24;
> > +		priv->write_reg32(priv, C_CAN_IFACE(DATA1_REG, iface) + i /
> > 2, data);
> >   	}
> >   }
> 
> Interesting observation! Richard (on CC now), does the patch also fix 
> your similar issues?
> 
> > At the moment I can't explain this behaviour, or if this is really the 
only
> > way to fix the issue for the Cyclone V.
> 
> Maybe the hardware does not like the 16-bit accesses. BTW: How is the 
> device configured? C_CAN with 32-bit accesses or D_CAN? Anyway, real 
> 32-bit accesses would be faster anyway.

The peripheral is D_CAN with 32-bit registers. 
There are many other 16-bit accesses in the driver, but it is only the data 
registers that seem not to like them. And the read from the data registers 
works fine as 16-bit accesses.
I agree that 32-bit accesses would be faster in this case. If we were to 
make a kernel patch, would the correct way to do it be to add a 32-bit 
access property to the d_can device tree binding?

> 
> > The tests that I've done so far seem to indicate that the data is 
correct
> > in the driver right up to the point at which it is actually written to 
the
> > CAN peripheral registers, so I think that your suspicion is correct.
> 
> Maybe some hardware guys from Altera could help.

I've created a service request with Altera. Hopefully I can eventually get 
this investigated by their hardware guys.

Regards,
Andy





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: c_can driver sometimes sends first two bytes filled with zeros
  2016-06-10 13:12           ` Andy Haydon
@ 2016-06-10 13:36             ` Wolfgang Grandegger
  0 siblings, 0 replies; 14+ messages in thread
From: Wolfgang Grandegger @ 2016-06-10 13:36 UTC (permalink / raw)
  To: Andy Haydon, linux-can

Am 10.06.2016 um 15:12 schrieb Andy Haydon:
> Hi Wolfgang,
>
> Wolfgang Grandegger <wg <at> grandegger.com> writes:
>
>>
>> Hello Andy,
>>
>> Am 10.06.2016 um 12:49 schrieb Andy Haydon:
>>> Hi Wolfgang,
>>>
>>> Wolfgang Grandegger <wg <at> grandegger.com> writes:
>>>
>>>>
>>>> Making "priv->write_regs() atomic is something I would try as well. My
>>>> suspicion is that something goes wrong writing(/reading) the controller
>>>> registers.
>>>
>>> We have also been looking at a similar issue over the last few days. In
> our
>>> case, bytes 0 and 1 and bytes 4 and 5 of an 8 byte message can be
> corrupted
>>> randomly - not necessarily zeroed. Bytes 2 and 3 and bytes 6 and 7 are
> never
>>> corrupted. The Tx task is running as a high priority real time thread in
>>> PREEMPT_RT.
>>>
>>> I've found that so far the only thing that fixes this issue is to
> perform
>>> the writes to the data registers as 32 bit operations something like
> this
>>> (for 3.18.20):
>>>
>>> --- a/drivers/net/can/c_can/c_can.c	2015-08-07 20:08:04.000000000 +0100
>>> +++ b/drivers/net/can/c_can/c_can.c	2016-06-09 14:46:51.497895357 +0100
>>>   <at>  <at>  -331,9 +332,19  <at>  <at>
>>>
>>>    	priv->write_reg(priv, C_CAN_IFACE(MSGCTRL_REG, iface), ctrl);
>>>
>>> -	for (i = 0; i < frame->can_dlc; i += 2) {
>>> -		priv->write_reg(priv, C_CAN_IFACE(DATA1_REG, iface) + i / 2,
>>> -				frame->data[i] | (frame->data[i + 1] << 8));
>>> +// this doesn't work for Cyclone V
>>> +//	for (i = 0; i < frame->can_dlc; i += 2) {
>>> +//		priv->write_reg(priv, C_CAN_IFACE(DATA1_REG, iface) + i / 2,
>>> +//				frame->data[i] | (frame->data[i + 1] << 8));
>>> +
>>> +// replaced with this as a workaround
>>> +	u32 data = 0;
>>> +	for (i = 0; i < frame->can_dlc; i += 4) {
>>> +		data = (u32)frame->data[i];
>>> +		data |= (u32)frame->data[i + 1] << 8;
>>> +		data |= (u32)frame->data[i + 2] << 16;
>>> +		data |= (u32)frame->data[i + 3] << 24;
>>> +		priv->write_reg32(priv, C_CAN_IFACE(DATA1_REG, iface) + i /
>>> 2, data);
>>>    	}
>>>    }
>>
>> Interesting observation! Richard (on CC now), does the patch also fix
>> your similar issues?
>>
>>> At the moment I can't explain this behaviour, or if this is really the
> only
>>> way to fix the issue for the Cyclone V.
>>
>> Maybe the hardware does not like the 16-bit accesses. BTW: How is the
>> device configured? C_CAN with 32-bit accesses or D_CAN? Anyway, real
>> 32-bit accesses would be faster anyway.
>
> The peripheral is D_CAN with 32-bit registers.
> There are many other 16-bit accesses in the driver, but it is only the data
> registers that seem not to like them. And the read from the data registers
> works fine as 16-bit accesses.
> I agree that 32-bit accesses would be faster in this case. If we were to
> make a kernel patch, would the correct way to do it be to add a 32-bit
> access property to the d_can device tree binding?

My understanding is that D_CAN can perform 32bit accesses while the 
C_CAN always does 16bit accesses. Therefore I would use 
[read|write]_reg32 "if (priv->type != BOSCH_D_CAN)" using code similar to:

   http://lxr.free-electrons.com/source/drivers/net/can/flexcan.c#L492

Wolfgang.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-06-10 13:36 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-12  9:23 c_can driver sometimes sends first two bytes filled with zeros Richard Andrysek
2016-05-16 18:14 ` Thor Thayer
2016-05-17 17:18   ` AW: " Richard Andrysek
2016-05-18 15:35     ` Thor Thayer
     [not found]       ` <0120733A154AE74CA608A286CE7FFD2621D9CB60@rg-contact.RG.local>
2016-05-19 23:00         ` AW: " Thor Thayer
2016-05-20 12:01           ` AW: " Richard Andrysek
2016-05-23 14:22             ` Thor Thayer
2016-05-23 18:19 ` Wolfgang Grandegger
2016-06-01  9:40   ` AW: " Richard Andrysek
2016-06-01 13:09     ` Wolfgang Grandegger
2016-06-10 10:49       ` Andy Haydon
2016-06-10 12:55         ` Wolfgang Grandegger
2016-06-10 13:12           ` Andy Haydon
2016-06-10 13:36             ` Wolfgang Grandegger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.