All of lore.kernel.org
 help / color / mirror / Atom feed
* pch_can: Data transmission stops after dropped packet
@ 2012-11-14 15:39 Michael Pellegrini
  2012-11-14 21:40 ` Michael Pellegrini
                   ` (2 more replies)
  0 siblings, 3 replies; 79+ messages in thread
From: Michael Pellegrini @ 2012-11-14 15:39 UTC (permalink / raw)
  To: linux-can

Hi All,

I am trying to use the pch_can driver to interface with a CANopen system.
I am encountering a serious issue where, at random intervals, the CAN interface
will lock up and stop transmitting data.  Data reception continues normally.
The only way I've found to resolve the issue so far is to issue an
"ifconfig canx down" command followed by an "ifconfig canx up" command.
After the interface is reset through these commands, the TX packet dropped count
and the TX packet carrier count increment by 1.

I am running Ubuntu 12.04 with kernel version 3.2.0-23.

Has anyone else seen this problem or have a fix for it?

Thanks,
Mike




^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-14 15:39 pch_can: Data transmission stops after dropped packet Michael Pellegrini
@ 2012-11-14 21:40 ` Michael Pellegrini
  2012-11-15  7:18 ` Oliver Hartkopp
  2012-11-15 12:35 ` Steffen Rose
  2 siblings, 0 replies; 79+ messages in thread
From: Michael Pellegrini @ 2012-11-14 21:40 UTC (permalink / raw)
  To: linux-can

I should add that when this problem occurs, trying to transmit on the CAN 
interface returns the error "No buffer space available"

- Mike




^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-14 15:39 pch_can: Data transmission stops after dropped packet Michael Pellegrini
  2012-11-14 21:40 ` Michael Pellegrini
@ 2012-11-15  7:18 ` Oliver Hartkopp
  2012-11-15 13:13   ` Wolfgang Grandegger
  2012-11-15 16:12   ` Michael Pellegrini
  2012-11-15 12:35 ` Steffen Rose
  2 siblings, 2 replies; 79+ messages in thread
From: Oliver Hartkopp @ 2012-11-15  7:18 UTC (permalink / raw)
  To: Michael Pellegrini
  Cc: linux-can, tomoya.rohm, Wolfgang Grandegger, Bhupesh SHARMA,
	Alexander Stein, federico.vaga, Giancarlo Asnaghi

On 14.11.2012 16:39, Michael Pellegrini wrote:


> I am trying to use the pch_can driver to interface with a CANopen system.
> I am encountering a serious issue where, at random intervals, the CAN interface
> will lock up and stop transmitting data.  Data reception continues normally.
> The only way I've found to resolve the issue so far is to issue an
> "ifconfig canx down" command followed by an "ifconfig canx up" command.
> After the interface is reset through these commands, the TX packet dropped count
> and the TX packet carrier count increment by 1.
> 
> I am running Ubuntu 12.04 with kernel version 3.2.0-23.
> 
> Has anyone else seen this problem or have a fix for it?


Hello Mike,

i put the relevant people into CC that probably know better than me.

Btw. using the pch_can driver:

1. Can you please provide the output of 

	ip -details link show can0

(if can0 is your device with the problem)
at start time and when the problem happened?

2. Can you send the output of

	candump can0,0~0,#FFFFFFFF -e

from start time until the problem happens?

3. It turned out that the pch_can controller is created upon the C_CAN
CAN controller IP-core. Therefore Wolfgang posted an idea to replace the
pch_can driver with a PCI-attached C_CAN driver:

http://thread.gmane.org/gmane.linux.can/815

I don't know the current state of the tests/replacement.

But a general C_CAN PCI driver has been contributed by Frederico Vaga which
became a mainline driver in June

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=5b92da0443c2585e31b64e86c2e1b8e22845d4bb

Don't know if this driver can be used out of the box, when the PCI IDs of the
PCH_CAN board are added ...

But maybe the guys in CC know ;-)

Regards,
Oliver




^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-14 15:39 pch_can: Data transmission stops after dropped packet Michael Pellegrini
  2012-11-14 21:40 ` Michael Pellegrini
  2012-11-15  7:18 ` Oliver Hartkopp
@ 2012-11-15 12:35 ` Steffen Rose
  2012-11-15 18:26   ` Michael Pellegrini
  2 siblings, 1 reply; 79+ messages in thread
From: Steffen Rose @ 2012-11-15 12:35 UTC (permalink / raw)
  To: linux-can

> I am encountering a serious issue where, at random intervals, the CAN
> interface will lock up and stop transmitting data.  Data reception
> continues normally. 

Hello,

the situation is like my described situation at the 8.10.
"unsure - socketCAN blocking"

socketCAN with EMS-USB interface
3.2.0-32-generic 64bit
running in a virtualBox
ip link add dev can0 type can

sometimes:
Transmission stop
Reception is working

-- 
Steffen Rose


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-15  7:18 ` Oliver Hartkopp
@ 2012-11-15 13:13   ` Wolfgang Grandegger
  2012-11-15 16:23     ` Michael Pellegrini
                       ` (2 more replies)
  2012-11-15 16:12   ` Michael Pellegrini
  1 sibling, 3 replies; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-11-15 13:13 UTC (permalink / raw)
  To: Oliver Hartkopp
  Cc: Michael Pellegrini, linux-can, tomoya.rohm, Bhupesh SHARMA,
	Alexander Stein, federico.vaga, Giancarlo Asnaghi

On 11/15/2012 08:18 AM, Oliver Hartkopp wrote:
> On 14.11.2012 16:39, Michael Pellegrini wrote:
> 
> 
>> I am trying to use the pch_can driver to interface with a CANopen system.
>> I am encountering a serious issue where, at random intervals, the CAN interface
>> will lock up and stop transmitting data.  Data reception continues normally.
>> The only way I've found to resolve the issue so far is to issue an
>> "ifconfig canx down" command followed by an "ifconfig canx up" command.
>> After the interface is reset through these commands, the TX packet dropped count
>> and the TX packet carrier count increment by 1.
>>
>> I am running Ubuntu 12.04 with kernel version 3.2.0-23.
>>
>> Has anyone else seen this problem or have a fix for it?
> 
> 
> Hello Mike,
> 
> i put the relevant people into CC that probably know better than me.
> 
> Btw. using the pch_can driver:
> 
> 1. Can you please provide the output of 
> 
> 	ip -details link show can0
> 
> (if can0 is your device with the problem)
> at start time and when the problem happened?
> 
> 2. Can you send the output of
> 
> 	candump can0,0~0,#FFFFFFFF -e
> 
> from start time until the problem happens?
> 
> 3. It turned out that the pch_can controller is created upon the C_CAN
> CAN controller IP-core. Therefore Wolfgang posted an idea to replace the
> pch_can driver with a PCI-attached C_CAN driver:
> 
> http://thread.gmane.org/gmane.linux.can/815
> 
> I don't know the current state of the tests/replacement.

As you can see from this thread, the replacement patch did not yet
succesfully run on such a platform. But I definitively would like to get
rid of the obsolete pch_can driver. Testers are welcome! Michael, would
you be able/willing to test a patch, even with a recent kernel version?

> But a general C_CAN PCI driver has been contributed by Frederico Vaga which
> became a mainline driver in June
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=5b92da0443c2585e31b64e86c2e1b8e22845d4bb
> 
> Don't know if this driver can be used out of the box, when the PCI IDs of the
> PCH_CAN board are added ...
> 
> But maybe the guys in CC know ;-)

You need a PCI driver first.

Wolfgang.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-15  7:18 ` Oliver Hartkopp
  2012-11-15 13:13   ` Wolfgang Grandegger
@ 2012-11-15 16:12   ` Michael Pellegrini
  2012-11-20 18:59     ` Wolfgang Grandegger
  1 sibling, 1 reply; 79+ messages in thread
From: Michael Pellegrini @ 2012-11-15 16:12 UTC (permalink / raw)
  To: linux-can

Oliver Hartkopp <socketcan <at> hartkopp.net> writes:

> 
> On 14.11.2012 16:39, Michael Pellegrini wrote:
> 
> > I am trying to use the pch_can driver to interface with a CANopen system.
> > I am encountering a serious issue where, at random intervals, the CAN 
interface
> > will lock up and stop transmitting data.  Data reception continues normally.
> > The only way I've found to resolve the issue so far is to issue an
> > "ifconfig canx down" command followed by an "ifconfig canx up" command.
> > After the interface is reset through these commands, the TX packet dropped 
count
> > and the TX packet carrier count increment by 1.
> > 
> > I am running Ubuntu 12.04 with kernel version 3.2.0-23.
> > 
> > Has anyone else seen this problem or have a fix for it?
> 
> Hello Mike,
> 
> i put the relevant people into CC that probably know better than me.
> 
> Btw. using the pch_can driver:
> 
> 1. Can you please provide the output of 
> 
> 	ip -details link show can0
> 
> (if can0 is your device with the problem)
> at start time and when the problem happened?

At Start Time:

3: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
    link/can
    can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
    bitrate 250000 sample-point 0.875
    tq 500 prop-seg 3 phase-seg1 3 phase-seg2 1 sjw 1
    pch_can: tseg1 2..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
    clock 50000000

After Problem Happens:

3: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
    link/can
    can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
    bitrate 250000 sample-point 0.875
    tq 500 prop-seg 3 phase-seg1 3 phase-seg2 1 sjw 1
    pch_can: tseg1 2..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
    clock 50000000

I then reset the interface via "ifconfig can0 down" and "ifconfig can0 up".

Output is:

3: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
    link/can
    can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
    bitrate 250000 sample-point 0.875
    tq 500 prop-seg 3 phase-seg1 3 phase-seg2 1 sjw 1
    pch_can: tseg1 2..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
    clock 50000000

Output of "ifconfig can0" is:

can0      Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-
00-00
          UP RUNNING NOARP  MTU:16  Metric:1
          RX packets:1085 errors:0 dropped:769 overruns:0 frame:0
          TX packets:2039 errors:0 dropped:1 overruns:0 carrier:1
          collisions:0 txqueuelen:10
          RX bytes:1085 (1.0 KB)  TX bytes:10467 (10.4 KB)
          Interrupt:18

Notice that the "dropped" and "carrier" count incremented by 1.

> 
> 2. Can you send the output of
> 
> 	candump can0,0~0,#FFFFFFFF -e
> 
> from start time until the problem happens?

candump produces no output with the options provided.  As a sanity check, I ran 
candump with the options "can0,0:0,#FFFFFFFF -e" and got all data frames, as
expected.

> 
> 3. It turned out that the pch_can controller is created upon the C_CAN
> CAN controller IP-core. Therefore Wolfgang posted an idea to replace the
> pch_can driver with a PCI-attached C_CAN driver:
> 
> http://thread.gmane.org/gmane.linux.can/815
> 
> I don't know the current state of the tests/replacement.
> 
> But a general C_CAN PCI driver has been contributed by Frederico Vaga which
> became a mainline driver in June
> 
> http://git.kernel.org/?
p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=5b92da0443c2585e31b64e86c2e
1b8e22845d4bb
> 
> Don't know if this driver can be used out of the box, when the PCI IDs of the
> PCH_CAN board are added ...
> 
> But maybe the guys in CC know 
> 
> Regards,
> Oliver
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to majordomo <at> vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 




^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-15 13:13   ` Wolfgang Grandegger
@ 2012-11-15 16:23     ` Michael Pellegrini
  2012-11-15 21:19       ` Wolfgang Grandegger
  2012-11-15 16:32     ` Casper Mogensen
  2012-11-16 19:39     ` Wolfgang Grandegger
  2 siblings, 1 reply; 79+ messages in thread
From: Michael Pellegrini @ 2012-11-15 16:23 UTC (permalink / raw)
  To: linux-can

Wolfgang Grandegger <wg <at> grandegger.com> writes:

> > 
> > 3. It turned out that the pch_can controller is created upon the C_CAN
> > CAN controller IP-core. Therefore Wolfgang posted an idea to replace the
> > pch_can driver with a PCI-attached C_CAN driver:
> > 
> > http://thread.gmane.org/gmane.linux.can/815
> > 
> > I don't know the current state of the tests/replacement.
> 
> As you can see from this thread, the replacement patch did not yet
> succesfully run on such a platform. But I definitively would like to get
> rid of the obsolete pch_can driver. Testers are welcome! Michael, would
> you be able/willing to test a patch, even with a recent kernel version?

I would be willing to test a patch, however any fix would need to be back-ported
to the 3.2.0 kernel to be useful to me.  For the sake of testing though, I would
be willing to update to a newer kernel.

Also, I should warn you that I am not a Linux expert.  Thus, operations like
updating/recompiling the kernel are new to me and I'll need some help working
through the process.

> 
> > But a general C_CAN PCI driver has been contributed by Frederico Vaga which
> > became a mainline driver in June
> > 
> > http://git.kernel.org/?
p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=5b92da0443c2585e31b64e86c2e
1b8e22845d4bb
> > 
> > Don't know if this driver can be used out of the box, when the PCI IDs of 
the
> > PCH_CAN board are added ...
> > 
> > But maybe the guys in CC know 
> 
> You need a PCI driver first.
> 
> Wolfgang.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to majordomo <at> vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 





^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-15 13:13   ` Wolfgang Grandegger
  2012-11-15 16:23     ` Michael Pellegrini
@ 2012-11-15 16:32     ` Casper Mogensen
  2012-11-15 21:16       ` Wolfgang Grandegger
  2012-11-16 19:39     ` Wolfgang Grandegger
  2 siblings, 1 reply; 79+ messages in thread
From: Casper Mogensen @ 2012-11-15 16:32 UTC (permalink / raw)
  To: Wolfgang Grandegger
  Cc: Oliver Hartkopp, Michael Pellegrini, linux-can, tomoya.rohm,
	Bhupesh SHARMA, Alexander Stein, federico.vaga,
	Giancarlo Asnaghi

Hi all

I have been working with the eg20t chipset and the pch_can driver a
lot up until January this year, where the project i was working on
unfortunately was shut down. There is a bug in the implementation,
which causes the transmit buffers to fill up and all become
unavailable. It happens randomly, but is easily triggered with a high
load. I experienced the same problems as Michael.

I have not been working on it for a long time, so i don't recall the
problem precisely, but as i remember there is two memory areas which
is used for communication between the processor and the can core. One
is used for receive, and one is used for transmit in the pch_can
driver. When initiating a transmit, a flag is indicating that an
interrupt must be generated upon transmit receive, if a transmit
interrupt is handled during an ongoing transmit, then problems can
occur.

From pch_xmit in pch_can
on line 940 in pch_can.c: can_put_echo_skb is called, which occupies
the skb(which to my best knowledge, is the reason that you get a
buffer overflow)
on line 943 in pch_can.c: The transmit complete interrupt flag is
written to the internal register (but not writing to the can core yet)
on line 946 in pch_can.c: pch_can_rw_msg_obj is issued, which writes
the internal registers to can core.

If the transmit completed handler has been running between lines 943
or 946, the pch_tx_complete routine will clear the transmit interupt
enable flag in priv->regs->ifregs[1] (same register is used in both),
then you end up writing something the message to the core without
transmit completed interrupt enabled, and with an occupied skb, then
you eventually runs out of transmit buffers, as the skb's are free'd
in pch_tx_complete, which is triggered by a transmit completed
interrupt

I am a little rusty in this issue, as it is quite a long time ago i
was working with it, but i hope the description is understandable :-)

Regards

Casper Mogensen

On Thu, Nov 15, 2012 at 2:13 PM, Wolfgang Grandegger <wg@grandegger.com> wrote:
> On 11/15/2012 08:18 AM, Oliver Hartkopp wrote:
>> On 14.11.2012 16:39, Michael Pellegrini wrote:
>>
>>
>>> I am trying to use the pch_can driver to interface with a CANopen system.
>>> I am encountering a serious issue where, at random intervals, the CAN interface
>>> will lock up and stop transmitting data.  Data reception continues normally.
>>> The only way I've found to resolve the issue so far is to issue an
>>> "ifconfig canx down" command followed by an "ifconfig canx up" command.
>>> After the interface is reset through these commands, the TX packet dropped count
>>> and the TX packet carrier count increment by 1.
>>>
>>> I am running Ubuntu 12.04 with kernel version 3.2.0-23.
>>>
>>> Has anyone else seen this problem or have a fix for it?
>>
>>
>> Hello Mike,
>>
>> i put the relevant people into CC that probably know better than me.
>>
>> Btw. using the pch_can driver:
>>
>> 1. Can you please provide the output of
>>
>>       ip -details link show can0
>>
>> (if can0 is your device with the problem)
>> at start time and when the problem happened?
>>
>> 2. Can you send the output of
>>
>>       candump can0,0~0,#FFFFFFFF -e
>>
>> from start time until the problem happens?
>>
>> 3. It turned out that the pch_can controller is created upon the C_CAN
>> CAN controller IP-core. Therefore Wolfgang posted an idea to replace the
>> pch_can driver with a PCI-attached C_CAN driver:
>>
>> http://thread.gmane.org/gmane.linux.can/815
>>
>> I don't know the current state of the tests/replacement.
>
> As you can see from this thread, the replacement patch did not yet
> succesfully run on such a platform. But I definitively would like to get
> rid of the obsolete pch_can driver. Testers are welcome! Michael, would
> you be able/willing to test a patch, even with a recent kernel version?
>
>> But a general C_CAN PCI driver has been contributed by Frederico Vaga which
>> became a mainline driver in June
>>
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=5b92da0443c2585e31b64e86c2e1b8e22845d4bb
>>
>> Don't know if this driver can be used out of the box, when the PCI IDs of the
>> PCH_CAN board are added ...
>>
>> But maybe the guys in CC know ;-)
>
> You need a PCI driver first.
>
> Wolfgang.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Regards / Med venlig hilsen

Casper Mogensen

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-15 12:35 ` Steffen Rose
@ 2012-11-15 18:26   ` Michael Pellegrini
  2012-11-16  8:24     ` Steffen Rose
  0 siblings, 1 reply; 79+ messages in thread
From: Michael Pellegrini @ 2012-11-15 18:26 UTC (permalink / raw)
  To: linux-can

Steffen Rose <ro <at> emtas.de> writes:

> 
> > I am encountering a serious issue where, at random intervals, the CAN
> > interface will lock up and stop transmitting data.  Data reception
> > continues normally. 
> 
> Hello,
> 
> the situation is like my described situation at the 8.10.
> "unsure - socketCAN blocking"
> 
> socketCAN with EMS-USB interface
> 3.2.0-32-generic 64bit
> running in a virtualBox
> ip link add dev can0 type can
> 
> sometimes:
> Transmission stop
> Reception is working
> 

Hi Steffen,

I saw your post and tried the solutions proposed by Marc, which are copied
below:

"
Your program probably produces CAN frames faster than they could be
delivered by your CAN hardware. Your choices are:
1) If the write/sendto system call returns with -ENOBUFS,
   add the filedesciptor to poll/select and wait for it to be writeable
   again. For some known, but not yet resolved reasons (a.k.a. bug),
   the poll/select will return immediately, but the write will fail
   again. To workaround this usleep for some milliseconds between
   the write/sendto and the poll/select system call.

2) Increase txqueue length to 10000 or so.
   (ifconfig can0 txqueuelen 10000)
   This avoids to trigger the above bug and your write/sendto will
   block until the queues are free again and your CAN frame will added
   to the send queue. With this workaround poll/select work as expected,
   i.e. when they return the socket is writeable without problems.

Hope that helps,
Marc
"

I added a select call before the write to check that the file descriptor was
available for writing.  As expected, I hit the bug where the select call
indicated the file descriptor was available for writing but the write failed.

Increasing the tx queue length to 10000 resolved the select bug, but not the
data transmission bug.  Now, when the data transmission bug occurs, the select
call indicates that the file descriptor is not available for writing.

- Mike 



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-15 16:32     ` Casper Mogensen
@ 2012-11-15 21:16       ` Wolfgang Grandegger
  0 siblings, 0 replies; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-11-15 21:16 UTC (permalink / raw)
  To: Casper Mogensen
  Cc: Oliver Hartkopp, Michael Pellegrini, linux-can, tomoya.rohm,
	Bhupesh SHARMA, Alexander Stein, federico.vaga,
	Giancarlo Asnaghi

Hi Casper,

On 11/15/2012 05:32 PM, Casper Mogensen wrote:
> Hi all
> 
> I have been working with the eg20t chipset and the pch_can driver a
> lot up until January this year, where the project i was working on
> unfortunately was shut down. There is a bug in the implementation,
> which causes the transmit buffers to fill up and all become
> unavailable. It happens randomly, but is easily triggered with a high
> load. I experienced the same problems as Michael.
> 
> I have not been working on it for a long time, so i don't recall the
> problem precisely, but as i remember there is two memory areas which
> is used for communication between the processor and the can core. One
> is used for receive, and one is used for transmit in the pch_can
> driver. When initiating a transmit, a flag is indicating that an
> interrupt must be generated upon transmit receive, if a transmit
> interrupt is handled during an ongoing transmit, then problems can
> occur.
> 
>>From pch_xmit in pch_can
> on line 940 in pch_can.c: can_put_echo_skb is called, which occupies
> the skb(which to my best knowledge, is the reason that you get a
> buffer overflow)
> on line 943 in pch_can.c: The transmit complete interrupt flag is
> written to the internal register (but not writing to the can core yet)
> on line 946 in pch_can.c: pch_can_rw_msg_obj is issued, which writes
> the internal registers to can core.
> 
> If the transmit completed handler has been running between lines 943
> or 946, the pch_tx_complete routine will clear the transmit interupt
> enable flag in priv->regs->ifregs[1] (same register is used in both),
> then you end up writing something the message to the core without
> transmit completed interrupt enabled, and with an occupied skb, then
> you eventually runs out of transmit buffers, as the skb's are free'd
> in pch_tx_complete, which is triggered by a transmit completed
> interrupt
> 
> I am a little rusty in this issue, as it is quite a long time ago i
> was working with it, but i hope the description is understandable :-)

Thanks for your info. This confirms that there is there is a bug
somewhere in the driver. Instead of chasing it, I prefer switching to
the C_CAN driver first.

Wolfgang.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-15 16:23     ` Michael Pellegrini
@ 2012-11-15 21:19       ` Wolfgang Grandegger
  2012-11-15 21:34         ` Michael Pellegrini
  0 siblings, 1 reply; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-11-15 21:19 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can

On 11/15/2012 05:23 PM, Michael Pellegrini wrote:
> Wolfgang Grandegger <wg <at> grandegger.com> writes:
> 
>>>
>>> 3. It turned out that the pch_can controller is created upon the C_CAN
>>> CAN controller IP-core. Therefore Wolfgang posted an idea to replace the
>>> pch_can driver with a PCI-attached C_CAN driver:
>>>
>>> http://thread.gmane.org/gmane.linux.can/815
>>>
>>> I don't know the current state of the tests/replacement.
>>
>> As you can see from this thread, the replacement patch did not yet
>> succesfully run on such a platform. But I definitively would like to get
>> rid of the obsolete pch_can driver. Testers are welcome! Michael, would
>> you be able/willing to test a patch, even with a recent kernel version?
> 
> I would be willing to test a patch, however any fix would need to be back-ported
> to the 3.2.0 kernel to be useful to me.  For the sake of testing though, I would
> be willing to update to a newer kernel.

I need to check first if this driver can be supported with little effort.

> Also, I should warn you that I am not a Linux expert.  Thus, operations like
> updating/recompiling the kernel are new to me and I'll need some help working
> through the process.

Of course! You are welcome. What Linux distribution do you use?

Wolfgang.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-15 21:19       ` Wolfgang Grandegger
@ 2012-11-15 21:34         ` Michael Pellegrini
  2012-11-15 21:51           ` Wolfgang Grandegger
  2012-11-18 22:22           ` Wolfgang Grandegger
  0 siblings, 2 replies; 79+ messages in thread
From: Michael Pellegrini @ 2012-11-15 21:34 UTC (permalink / raw)
  To: linux-can

Wolfgang Grandegger <wg <at> grandegger.com> writes:

> I need to check first if this driver can be supported with little effort.

Thank you for your consideration.

> 
> > Also, I should warn you that I am not a Linux expert.  Thus, operations like
> > updating/recompiling the kernel are new to me and I'll need some help 
working
> > through the process.
> 
> Of course! You are welcome. What Linux distribution do you use?

I am running Ubuntu 12.04 LTS

- Mike





^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-15 21:34         ` Michael Pellegrini
@ 2012-11-15 21:51           ` Wolfgang Grandegger
  2012-11-18 22:22           ` Wolfgang Grandegger
  1 sibling, 0 replies; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-11-15 21:51 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can

On 11/15/2012 10:34 PM, Michael Pellegrini wrote:
> Wolfgang Grandegger <wg <at> grandegger.com> writes:
> 
>> I need to check first if this driver can be supported with little effort.
> 
> Thank you for your consideration.
> 
>>
>>> Also, I should warn you that I am not a Linux expert.  Thus, operations like
>>> updating/recompiling the kernel are new to me and I'll need some help 
> working
>>> through the process.
>>
>> Of course! You are welcome. What Linux distribution do you use?
> 
> I am running Ubuntu 12.04 LTS

Perfect! That's what I use as well. More soon...

Wolfgang.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-15 18:26   ` Michael Pellegrini
@ 2012-11-16  8:24     ` Steffen Rose
  0 siblings, 0 replies; 79+ messages in thread
From: Steffen Rose @ 2012-11-16  8:24 UTC (permalink / raw)
  To: linux-can

Hello,

Am Donnerstag, 15. November 2012, 18:26:39 schrieb Michael Pellegrini:
> I saw your post and tried the solutions proposed by Marc, which are copied
> below:
> 
> "
> Your program probably produces CAN frames faster than they could be
> delivered by your CAN hardware. Your choices are:

The possible solutions had not solved my problem. I hope I wrote it. My idea 
was at this point, that I had a problem with the USB connection.

After I read this problem descrition I think, it is a general problem, that is 
not only involved in the pch_can driver, because I use a different one with the 
same problem (as seen from the users point of view ;-) ).

-- 
Best regards
Steffen Rose


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-15 13:13   ` Wolfgang Grandegger
  2012-11-15 16:23     ` Michael Pellegrini
  2012-11-15 16:32     ` Casper Mogensen
@ 2012-11-16 19:39     ` Wolfgang Grandegger
  2 siblings, 0 replies; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-11-16 19:39 UTC (permalink / raw)
  To: Oliver Hartkopp
  Cc: Michael Pellegrini, linux-can, tomoya.rohm, Bhupesh SHARMA,
	Alexander Stein, federico.vaga, Giancarlo Asnaghi

Hi Oliver,

On 11/15/2012 02:13 PM, Wolfgang Grandegger wrote:
> On 11/15/2012 08:18 AM, Oliver Hartkopp wrote:
>> On 14.11.2012 16:39, Michael Pellegrini wrote:
>>
>>
>>> I am trying to use the pch_can driver to interface with a CANopen system.
>>> I am encountering a serious issue where, at random intervals, the CAN interface
>>> will lock up and stop transmitting data.  Data reception continues normally.
>>> The only way I've found to resolve the issue so far is to issue an
>>> "ifconfig canx down" command followed by an "ifconfig canx up" command.
>>> After the interface is reset through these commands, the TX packet dropped count
>>> and the TX packet carrier count increment by 1.
>>>
>>> I am running Ubuntu 12.04 with kernel version 3.2.0-23.
>>>
>>> Has anyone else seen this problem or have a fix for it?
>>
>>
>> Hello Mike,
>>
>> i put the relevant people into CC that probably know better than me.
>>
>> Btw. using the pch_can driver:
>>
>> 1. Can you please provide the output of 
>>
>> 	ip -details link show can0
>>
>> (if can0 is your device with the problem)
>> at start time and when the problem happened?
>>
>> 2. Can you send the output of
>>
>> 	candump can0,0~0,#FFFFFFFF -e
>>
>> from start time until the problem happens?
>>
>> 3. It turned out that the pch_can controller is created upon the C_CAN
>> CAN controller IP-core. Therefore Wolfgang posted an idea to replace the
>> pch_can driver with a PCI-attached C_CAN driver:
>>
>> http://thread.gmane.org/gmane.linux.can/815
>>
>> I don't know the current state of the tests/replacement.
> 
> As you can see from this thread, the replacement patch did not yet
> succesfully run on such a platform. But I definitively would like to get
> rid of the obsolete pch_can driver. Testers are welcome! Michael, would
> you be able/willing to test a patch, even with a recent kernel version?
> 
>> But a general C_CAN PCI driver has been contributed by Frederico Vaga which
>> became a mainline driver in June
>>
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=5b92da0443c2585e31b64e86c2e1b8e22845d4bb
>>
>> Don't know if this driver can be used out of the box, when the PCI IDs of the
>> PCH_CAN board are added ...
>>
>> But maybe the guys in CC know ;-)
> 
> You need a PCI driver first.

Well, as you pointed out, there is already a PCI driver for the
[CD]_CAN... I totally forgot. That's already a good starting point and
as it already does support 32-bit alignment there should not be too much
missing. I'm going to prepare a patch for the PCH_CAN for Michael.

Wolfgang.



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-15 21:34         ` Michael Pellegrini
  2012-11-15 21:51           ` Wolfgang Grandegger
@ 2012-11-18 22:22           ` Wolfgang Grandegger
  2012-11-19 15:10             ` Michael Pellegrini
  1 sibling, 1 reply; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-11-18 22:22 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can

[-- Attachment #1: Type: text/plain, Size: 4889 bytes --]

On 11/15/2012 10:34 PM, Michael Pellegrini wrote:
> Wolfgang Grandegger <wg <at> grandegger.com> writes:
> 
>> I need to check first if this driver can be supported with little effort.
> 
> Thank you for your consideration.

For simple out-of-tree build I have now extracted the most recent
version of c_can and c_can_pci driver files, backported it to v3.2 and
added the PCH related code.

You should be able to build these drivers on your system as shown below:

  $ tar xf c-can-pci-v1.tar.bz2
  $ cd c-can-pci-v1
  $ CONFIG_CAN_C_CAN_PCI=m \
    make -C /usr/src/linux-headers-`uname -r` SUBDIRS=`pwd` modules

This requires that you have the kernel header, gcc and fiends installed
on your system.

Then you can load the modules as shown below, after unloading the
original C_CAN modules:

  $ sudo modprobe can_dev
  $ rmmod c_can
  $ rmmod c_can_pci
  $ sudo insmod c_can.ko
  $ sudo insmod c_can_pci.ko

Hope it works as expected. 

Below is the patch I used for this driver:

From 562bbd0840e710d45ee211a0f99e5a5288dff4ba Mon Sep 17 00:00:00 2001
From: Wolfgang Grandegger <wg@grandegger.com>
Date: Sun, 18 Nov 2012 23:14:04 +0100
Subject: [PATCH] c_can_pci: first hack to get out-of-tree C_CAN_PCI driver to
 work

---
 c_can.c     |    3 +++
 c_can.h     |    1 +
 c_can_pci.c |   45 ++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/c_can.c b/c_can.c
index e5180df..23fbcd3 100644
--- a/c_can.c
+++ b/c_can.c
@@ -1106,6 +1106,9 @@ static int c_can_open(struct net_device *dev)
 		goto exit_irq_fail;
 	}
 
+	if (priv->reset)
+		priv->reset(priv);
+
 	napi_enable(&priv->napi);
 
 	/* start the c_can controller */
diff --git a/c_can.h b/c_can.h
index e5ed41d..4bad1d4 100644
--- a/c_can.h
+++ b/c_can.h
@@ -161,6 +161,7 @@ struct c_can_priv {
 	int last_status;
 	u16 (*read_reg) (struct c_can_priv *priv, enum reg index);
 	void (*write_reg) (struct c_can_priv *priv, enum reg index, u16 val);
+	void (*reset) (struct c_can_priv *priv);
 	void __iomem *base;
 	const u16 *regs;
 	unsigned long irq_flags; /* for request_irq() */
diff --git a/c_can_pci.c b/c_can_pci.c
index f8bcd73..017d008 100644
--- a/c_can_pci.c
+++ b/c_can_pci.c
@@ -21,12 +21,17 @@
 #include "c_can.h"
 
 #ifndef PCI_DEVICE_ID_STMICRO_CAN
-#define PCI_DEVICE_ID_STMICRO_CAN 0xCC11
+#define PCI_DEVICE_ID_STMICRO_CAN	0xCC11
 #endif
+#ifndef PCI_DEVICE_ID_PCH_CAN
+#define PCI_DEVICE_ID_PCH_CAN	0x8818
+#endif
+#define PCH_PCI_SOFT_RESET	0x01fc
 
 enum c_can_pci_reg_align {
 	C_CAN_REG_ALIGN_16,
 	C_CAN_REG_ALIGN_32,
+	C_CAN_REG_32,
 };
 
 struct c_can_pci_data {
@@ -36,6 +41,8 @@ struct c_can_pci_data {
 	enum c_can_pci_reg_align reg_align;
 	/* Set the frequency */
 	unsigned int freq;
+	/* CAN reset callback */
+	void (*reset) (struct c_can_priv *priv);
 };
 
 /*
@@ -68,6 +75,27 @@ static void c_can_pci_write_reg_aligned_to_32bit(struct c_can_priv *priv,
 	writew(val, priv->base + 2 * priv->regs[index]);
 }
 
+static u16 c_can_pci_read_reg_32bit(struct c_can_priv *priv,
+				    enum reg index)
+{
+	return ioread32(priv->base + 4 * priv->regs[index]);
+}
+
+static void c_can_pci_write_reg_32bit(struct c_can_priv *priv,
+				      enum reg index, u16 val)
+{
+	iowrite32(val, priv->base + 4 * priv->regs[index]);
+}
+
+static void c_can_pci_reset_pch(struct c_can_priv *priv)
+{
+	u32 __iomem *addr = (u32 __iomem *)(priv->regs + PCH_PCI_SOFT_RESET);
+
+	/* write to sw reset register */
+	iowrite32(1, addr);
+	iowrite32(0, addr);
+}
+
 static int __devinit c_can_pci_probe(struct pci_dev *pdev,
 				     const struct pci_device_id *ent)
 {
@@ -147,11 +175,17 @@ static int __devinit c_can_pci_probe(struct pci_dev *pdev,
 		priv->read_reg = c_can_pci_read_reg_aligned_to_16bit;
 		priv->write_reg = c_can_pci_write_reg_aligned_to_16bit;
 		break;
+	case C_CAN_REG_32:
+		priv->read_reg = c_can_pci_read_reg_32bit;
+		priv->write_reg = c_can_pci_write_reg_32bit;
+		break;
 	default:
 		ret = -EINVAL;
 		goto out_free_c_can;
 	}
 
+	priv->reset = c_can_pci_data->reset;
+
 	ret = register_c_can_dev(dev);
 	if (ret) {
 		dev_err(&pdev->dev, "registering %s failed (err=%d)\n",
@@ -202,6 +236,13 @@ static struct c_can_pci_data c_can_sta2x11= {
 	.freq = 52000000, /* 52 Mhz */
 };
 
+static struct c_can_pci_data c_can_pch = {
+	.type = BOSCH_C_CAN,
+	.reg_align = C_CAN_REG_32,
+	.freq = 50000000, /* 50 MHz */
+	.reset = c_can_pci_reset_pch,
+};
+
 #define C_CAN_ID(_vend, _dev, _driverdata) {		\
 	PCI_DEVICE(_vend, _dev),			\
 	.driver_data = (unsigned long)&_driverdata,	\
@@ -209,6 +250,8 @@ static struct c_can_pci_data c_can_sta2x11= {
 static DEFINE_PCI_DEVICE_TABLE(c_can_pci_tbl) = {
 	C_CAN_ID(PCI_VENDOR_ID_STMICRO, PCI_DEVICE_ID_STMICRO_CAN,
 		 c_can_sta2x11),
+	C_CAN_ID(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_PCH_CAN,
+		 c_can_pch),
 	{},
 };
 static struct pci_driver c_can_pci_driver = {
-- 
1.7.9.5

Wolfgang.

[-- Attachment #2: c-can-pci-v1.tar.bz2 --]
[-- Type: application/x-bzip, Size: 10840 bytes --]

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-18 22:22           ` Wolfgang Grandegger
@ 2012-11-19 15:10             ` Michael Pellegrini
  2012-11-19 15:26               ` Wolfgang Grandegger
  0 siblings, 1 reply; 79+ messages in thread
From: Michael Pellegrini @ 2012-11-19 15:10 UTC (permalink / raw)
  To: linux-can

Wolfgang Grandegger <wg <at> grandegger.com> writes:

> 
> On 11/15/2012 10:34 PM, Michael Pellegrini wrote:
> > Wolfgang Grandegger <wg <at> grandegger.com> writes:
> > 
> >> I need to check first if this driver can be supported with little effort.
> > 
> > Thank you for your consideration.
> 
> For simple out-of-tree build I have now extracted the most recent
> version of c_can and c_can_pci driver files, backported it to v3.2 and
> added the PCH related code.
> 
> You should be able to build these drivers on your system as shown below:
> 
>   $ tar xf c-can-pci-v1.tar.bz2
>   $ cd c-can-pci-v1
>   $ CONFIG_CAN_C_CAN_PCI=m \
>     make -C /usr/src/linux-headers-`uname -r` SUBDIRS=`pwd` modules
> 
> This requires that you have the kernel header, gcc and fiends installed
> on your system.
> 
> Then you can load the modules as shown below, after unloading the
> original C_CAN modules:
> 
>   $ sudo modprobe can_dev
>   $ rmmod c_can
>   $ rmmod c_can_pci
>   $ sudo insmod c_can.ko
>   $ sudo insmod c_can_pci.ko
> 
> Hope it works as expected. 

I got the modules built, however loading them results in a non-functional CAN 
interface.  can0 no longer shows up in "ifconfig" or "ip".  Upon loading the 
modules, dmesg prints the following:

[ 2786.244341] c_can_pci 0000:02:0c.3: PCI INT C -> GSI 18 (level, low) -> IRQ 
18
[ 2786.244381] c_can_pci 0000:02:0c.3: setting latency timer to 64
[ 2786.244497] c_can_pci 0000:02:0c.3: irq 41 for MSI/MSI-X
[ 2786.244523] c_can_pci 0000:02:0c.3: device has no PCI memory resources, 
failing adapter
[ 2786.244586] c_can_pci 0000:02:0c.3: PCI INT C disabled
[ 2786.244611] c_can_pci: probe of 0000:02:0c.3 failed with error -12

To load the modules, I executed the following commands:

sudo rmmod c_can
sudo rmmod pch_can
sudo insmod ./c_can.ko
sudo insmod ./c_can_pci.ko

After loading the modules, lsmod outputs the following:

Module                  Size  Used by
c_can_pci              13009  0
c_can                  17867  1 c_can_pci
can_raw                12864  0
can                    27623  1 can_raw
i2c_isch               12662  0
snd_hda_codec_realtek   174055  1
snd_hda_intel          32765  3
snd_hda_codec         109562  2 snd_hda_codec_realtek,snd_hda_intel
snd_hwdep              13276  1 snd_hda_codec
snd_pcm                80845  2 snd_hda_intel,snd_hda_codec
snd_seq_midi           13132  0
snd_rawmidi            25424  1 snd_seq_midi
snd_seq_midi_event     14475  1 snd_seq_midi
snd_seq                51567  2 snd_seq_midi,snd_seq_midi_event
snd_timer              28931  2 snd_pcm,snd_seq
snd_seq_device         14172  3 snd_seq_midi,snd_rawmidi,snd_seq
snd                    62064  15 
snd_hda_codec_realtek,snd_hda_intel,snd_hda_codec,snd_hwdep,snd_pcm,snd_rawmidi,
snd_seq,snd_timer,snd_seq_device
soundcore              14635  1 snd
snd_page_alloc         14108  2 snd_hda_intel,snd_pcm
lpc_sch                12720  0
ftdi_sio               35859  0
usbserial              37173  1 ftdi_sio
can_dev                14748  1 c_can
bnep                   17830  2
rfcomm                 38139  0
bluetooth             158438  10 bnep,rfcomm
parport_pc             32114  0
ppdev                  12849  0
video                  19068  0
mac_hid                13077  0
pch_phub               13207  0
gpio_pch               13126  0
shpchp                 32325  0
lp                     17455  0
parport                40930  3 parport_pc,ppdev,lp
sdhci_pci              18324  0
sdhci                  28241  1 sdhci_pci
pch_gbe                38415  0

Is there anything else I should try, or do you have to make changes on your end?

Thanks,
Mike


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-19 15:10             ` Michael Pellegrini
@ 2012-11-19 15:26               ` Wolfgang Grandegger
  2012-11-19 16:20                 ` Michael Pellegrini
  0 siblings, 1 reply; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-11-19 15:26 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can

On 11/19/2012 04:10 PM, Michael Pellegrini wrote:
> Wolfgang Grandegger <wg <at> grandegger.com> writes:
> 
>>
>> On 11/15/2012 10:34 PM, Michael Pellegrini wrote:
>>> Wolfgang Grandegger <wg <at> grandegger.com> writes:
>>>
>>>> I need to check first if this driver can be supported with little effort.
>>>
>>> Thank you for your consideration.
>>
>> For simple out-of-tree build I have now extracted the most recent
>> version of c_can and c_can_pci driver files, backported it to v3.2 and
>> added the PCH related code.
>>
>> You should be able to build these drivers on your system as shown below:
>>
>>   $ tar xf c-can-pci-v1.tar.bz2
>>   $ cd c-can-pci-v1
>>   $ CONFIG_CAN_C_CAN_PCI=m \
>>     make -C /usr/src/linux-headers-`uname -r` SUBDIRS=`pwd` modules
>>
>> This requires that you have the kernel header, gcc and fiends installed
>> on your system.
>>
>> Then you can load the modules as shown below, after unloading the
>> original C_CAN modules:
>>
>>   $ sudo modprobe can_dev
>>   $ rmmod c_can
>>   $ rmmod c_can_pci
>>   $ sudo insmod c_can.ko
>>   $ sudo insmod c_can_pci.ko
>>
>> Hope it works as expected. 
> 
> I got the modules built, however loading them results in a non-functional CAN 
> interface.  can0 no longer shows up in "ifconfig" or "ip".  Upon loading the 
> modules, dmesg prints the following:
> 
> [ 2786.244341] c_can_pci 0000:02:0c.3: PCI INT C -> GSI 18 (level, low) -> IRQ 
> 18
> [ 2786.244381] c_can_pci 0000:02:0c.3: setting latency timer to 64
> [ 2786.244497] c_can_pci 0000:02:0c.3: irq 41 for MSI/MSI-X
> [ 2786.244523] c_can_pci 0000:02:0c.3: device has no PCI memory resources, 
> failing adapter
> [ 2786.244586] c_can_pci 0000:02:0c.3: PCI INT C disabled
> [ 2786.244611] c_can_pci: probe of 0000:02:0c.3 failed with error -12

"-ENOMEM", ah, the PCH uses PCI Bar 1 (instead of 0). I missed that.
Could you please replace:

	addr = pci_iomap(pdev, 0, pci_resource_len(pdev, 0));

with

	addr = pci_iomap(pdev, 1, pci_resource_len(pdev, 0));

in c_can_pci and retry.

Wolfgang.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-19 15:26               ` Wolfgang Grandegger
@ 2012-11-19 16:20                 ` Michael Pellegrini
  2012-11-19 16:31                   ` Wolfgang Grandegger
  0 siblings, 1 reply; 79+ messages in thread
From: Michael Pellegrini @ 2012-11-19 16:20 UTC (permalink / raw)
  To: linux-can

Wolfgang Grandegger <wg <at> grandegger.com> writes:

> 
> On 11/19/2012 04:10 PM, Michael Pellegrini wrote:
> > Wolfgang Grandegger <wg <at> grandegger.com> writes:
> > 
> >>
> >> On 11/15/2012 10:34 PM, Michael Pellegrini wrote:
> >>> Wolfgang Grandegger <wg <at> grandegger.com> writes:
> >>>
> >>>> I need to check first if this driver can be supported with little effort.
> >>>
> >>> Thank you for your consideration.
> >>
> >> For simple out-of-tree build I have now extracted the most recent
> >> version of c_can and c_can_pci driver files, backported it to v3.2 and
> >> added the PCH related code.
> >>
> >> You should be able to build these drivers on your system as shown below:
> >>
> >>   $ tar xf c-can-pci-v1.tar.bz2
> >>   $ cd c-can-pci-v1
> >>   $ CONFIG_CAN_C_CAN_PCI=m \
> >>     make -C /usr/src/linux-headers-`uname -r` SUBDIRS=`pwd` modules
> >>
> >> This requires that you have the kernel header, gcc and fiends installed
> >> on your system.
> >>
> >> Then you can load the modules as shown below, after unloading the
> >> original C_CAN modules:
> >>
> >>   $ sudo modprobe can_dev
> >>   $ rmmod c_can
> >>   $ rmmod c_can_pci
> >>   $ sudo insmod c_can.ko
> >>   $ sudo insmod c_can_pci.ko
> >>
> >> Hope it works as expected. 
> > 
> > I got the modules built, however loading them results in a non-functional 
CAN 
> > interface.  can0 no longer shows up in "ifconfig" or "ip".  Upon loading the 
> > modules, dmesg prints the following:
> > 
> > [ 2786.244341] c_can_pci 0000:02:0c.3: PCI INT C -> GSI 18 (level, low) -> 
IRQ 
> > 18
> > [ 2786.244381] c_can_pci 0000:02:0c.3: setting latency timer to 64
> > [ 2786.244497] c_can_pci 0000:02:0c.3: irq 41 for MSI/MSI-X
> > [ 2786.244523] c_can_pci 0000:02:0c.3: device has no PCI memory resources, 
> > failing adapter
> > [ 2786.244586] c_can_pci 0000:02:0c.3: PCI INT C disabled
> > [ 2786.244611] c_can_pci: probe of 0000:02:0c.3 failed with error -12
> 
> "-ENOMEM", ah, the PCH uses PCI Bar 1 (instead of 0). I missed that.
> Could you please replace:
> 
> 	addr = pci_iomap(pdev, 0, pci_resource_len(pdev, 0));
> 
> with
> 
> 	addr = pci_iomap(pdev, 1, pci_resource_len(pdev, 0));
> 
> in c_can_pci and retry.
> 
> Wolfgang.

I made the change, the module crashes on load now.  Dmesg output is:

[  249.244698] c_can_pci 0000:02:0c.3: PCI INT C -> GSI 18 (level, low) -> IRQ 
18
[  249.244737] c_can_pci 0000:02:0c.3: setting latency timer to 64
[  249.244869] c_can_pci 0000:02:0c.3: irq 41 for MSI/MSI-X
[  249.339061] BUG: unable to handle kernel paging request at f85af538
[  249.339078] IP: [<c12bc410>] iowrite32+0x30/0x40
[  249.339098] *pdpt = 0000000001938001 *pde = 0000000035822067 *pte = 
80000000228cc161
[  249.339113] Oops: 0003 [#1] SMP
[  249.339122] Modules linked in: c_can_pci(O) c_can(O) can_raw can i2c_isch 
snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi 
snd_rawmidi snd_seq_midi_event snd_seq snd_timer ftdi_sio snd_seq_device 
usbserial snd soundcore snd_page_alloc lpc_sch can_dev pch_phub rfcomm gpio_pch 
bnep bluetooth parport_pc ppdev video mac_hid shpchp lp parport sdhci_pci sdhci 
pch_gbe [last unloaded: pch_can]
[  249.339206]
[  249.339214] Pid: 2365, comm: ifconfig Tainted: G           O 3.2.0-23-
generic-pae #36-Ubuntu To be filled by O.E.M. To be filled by O.E.M./To be 
filled by O.E.M.
[  249.339231] EIP: 0060:[<c12bc410>] EFLAGS: 00010296 CPU: 1
[  249.339240] EIP is at iowrite32+0x30/0x40
[  249.339247] EAX: 00000001 EBX: f85af538 ECX: f85af538 EDX: f85af538
[  249.339255] ESI: 00000000 EDI: 00000000 EBP: e290be5c ESP: e290be5c
[  249.339263]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[  249.339271] Process ifconfig (pid: 2365, ti=e290a000 task=e721cbc0 
task.ti=e290a000)
[  249.339278] Stack:
[  249.339282]  e290be68 f85ae0b1 f2640000 e290be88 f85bdbd6 00000080 f2640000 
f2640000
[  249.339304]  f2640000 f85bf220 00000000 e290bea4 c14ad8f3 c15a7e06 e290bea4 
c14ad85e
[  249.339325]  f2640000 000000c1 e290becc c14adb62 000000c1 fffffff3 e290bec4 
00040080
[  249.339346] Call Trace:
[  249.339361]  [<f85ae0b1>] c_can_pci_reset_pch+0x21/0x30 [c_can_pci]
[  249.339374]  [<f85bdbd6>] c_can_open+0xa6/0x100 [c_can]
[  249.339387]  [<c14ad8f3>] __dev_open+0x83/0xd0
[  249.339400]  [<c15a7e06>] ? _raw_spin_unlock_bh+0x16/0x20
[  249.339410]  [<c14ad85e>] ? dev_set_rx_mode+0x2e/0x40
[  249.339421]  [<c14adb62>] __dev_change_flags+0x82/0x150
[  249.339431]  [<c14adcd1>] dev_change_flags+0x21/0x60
[  249.339443]  [<c1504f6c>] devinet_ioctl+0x4dc/0x640
[  249.339453]  [<c14ae26e>] ? dev_ioctl+0x1de/0x2f0
[  249.339463]  [<c150551d>] inet_ioctl+0x8d/0xb0
[  249.339475]  [<c149887d>] sock_ioctl+0x6d/0x290
[  249.339485]  [<c1498810>] ? move_addr_to_user+0x90/0x90
[  249.339497]  [<c1154869>] do_vfs_ioctl+0x79/0x2d0
[  249.339507]  [<c15a8ea0>] ? do_debug+0x180/0x180
[  249.339518]  [<c10197cb>] ? fpu_finit+0x2b/0x70
[  249.339528]  [<c1154b2f>] sys_ioctl+0x6f/0x80
[  249.339539]  [<c1012474>] ? math_state_restore+0x44/0x60
[  249.339551]  [<c15af11f>] sysenter_do_call+0x12/0x28
[  249.339557] Code: 03 00 89 e5 89 d1 77 23 81 fa 00 00 01 00 76 0b 81 e2 ff ff 
00 00 ef 5d c3 66 90 ba d8 94 77 c1 89 c8 e8 44 ff ff ff 5d c3 66 90 <89> 02 5d 
c3 8d b6 00 00 00 00 8d bf 00 00 00 00 55 81 fa ff ff
[  249.339675] EIP: [<c12bc410>] iowrite32+0x30/0x40 SS:ESP 0068:e290be5c
[  249.339688] CR2: 00000000f85af538
[  249.339697] ---[ end trace ee92cde2e8156087 ]---




^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-19 16:20                 ` Michael Pellegrini
@ 2012-11-19 16:31                   ` Wolfgang Grandegger
  2012-11-19 17:39                     ` Michael Pellegrini
  0 siblings, 1 reply; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-11-19 16:31 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can

On 11/19/2012 05:20 PM, Michael Pellegrini wrote:
> Wolfgang Grandegger <wg <at> grandegger.com> writes:
> 
>>
>> On 11/19/2012 04:10 PM, Michael Pellegrini wrote:
>>> Wolfgang Grandegger <wg <at> grandegger.com> writes:
>>>
>>>>
>>>> On 11/15/2012 10:34 PM, Michael Pellegrini wrote:
>>>>> Wolfgang Grandegger <wg <at> grandegger.com> writes:
>>>>>
>>>>>> I need to check first if this driver can be supported with little effort.
>>>>>
>>>>> Thank you for your consideration.
>>>>
>>>> For simple out-of-tree build I have now extracted the most recent
>>>> version of c_can and c_can_pci driver files, backported it to v3.2 and
>>>> added the PCH related code.
>>>>
>>>> You should be able to build these drivers on your system as shown below:
>>>>
>>>>   $ tar xf c-can-pci-v1.tar.bz2
>>>>   $ cd c-can-pci-v1
>>>>   $ CONFIG_CAN_C_CAN_PCI=m \
>>>>     make -C /usr/src/linux-headers-`uname -r` SUBDIRS=`pwd` modules
>>>>
>>>> This requires that you have the kernel header, gcc and fiends installed
>>>> on your system.
>>>>
>>>> Then you can load the modules as shown below, after unloading the
>>>> original C_CAN modules:
>>>>
>>>>   $ sudo modprobe can_dev
>>>>   $ rmmod c_can
>>>>   $ rmmod c_can_pci
>>>>   $ sudo insmod c_can.ko
>>>>   $ sudo insmod c_can_pci.ko
>>>>
>>>> Hope it works as expected. 
>>>
>>> I got the modules built, however loading them results in a non-functional 
> CAN 
>>> interface.  can0 no longer shows up in "ifconfig" or "ip".  Upon loading the 
>>> modules, dmesg prints the following:
>>>
>>> [ 2786.244341] c_can_pci 0000:02:0c.3: PCI INT C -> GSI 18 (level, low) -> 
> IRQ 
>>> 18
>>> [ 2786.244381] c_can_pci 0000:02:0c.3: setting latency timer to 64
>>> [ 2786.244497] c_can_pci 0000:02:0c.3: irq 41 for MSI/MSI-X
>>> [ 2786.244523] c_can_pci 0000:02:0c.3: device has no PCI memory resources, 
>>> failing adapter
>>> [ 2786.244586] c_can_pci 0000:02:0c.3: PCI INT C disabled
>>> [ 2786.244611] c_can_pci: probe of 0000:02:0c.3 failed with error -12
>>
>> "-ENOMEM", ah, the PCH uses PCI Bar 1 (instead of 0). I missed that.
>> Could you please replace:
>>
>> 	addr = pci_iomap(pdev, 0, pci_resource_len(pdev, 0));
>>
>> with
>>
>> 	addr = pci_iomap(pdev, 1, pci_resource_len(pdev, 0));
>>
>> in c_can_pci and retry.
>>
>> Wolfgang.
> 
> I made the change, the module crashes on load now.  Dmesg output is:

Grrr, sorry, it should be :

	addr = pci_iomap(pdev, 1, pci_resource_len(pdev, 1));

Wolfgang.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-19 16:31                   ` Wolfgang Grandegger
@ 2012-11-19 17:39                     ` Michael Pellegrini
  2012-11-19 19:22                       ` Wolfgang Grandegger
  0 siblings, 1 reply; 79+ messages in thread
From: Michael Pellegrini @ 2012-11-19 17:39 UTC (permalink / raw)
  To: linux-can

Wolfgang Grandegger <wg <at> grandegger.com> writes:

> 
> On 11/19/2012 05:20 PM, Michael Pellegrini wrote:
> > Wolfgang Grandegger <wg <at> grandegger.com> writes:
> > 
> >>
> >> On 11/19/2012 04:10 PM, Michael Pellegrini wrote:
> >>> Wolfgang Grandegger <wg <at> grandegger.com> writes:
> >>>
> >>>>
> >>>> On 11/15/2012 10:34 PM, Michael Pellegrini wrote:
> >>>>> Wolfgang Grandegger <wg <at> grandegger.com> writes:
> >>>>>
> >>>>>> I need to check first if this driver can be supported with little 
effort.
> >>>>>
> >>>>> Thank you for your consideration.
> >>>>
> >>>> For simple out-of-tree build I have now extracted the most recent
> >>>> version of c_can and c_can_pci driver files, backported it to v3.2 and
> >>>> added the PCH related code.
> >>>>
> >>>> You should be able to build these drivers on your system as shown below:
> >>>>
> >>>>   $ tar xf c-can-pci-v1.tar.bz2
> >>>>   $ cd c-can-pci-v1
> >>>>   $ CONFIG_CAN_C_CAN_PCI=m \
> >>>>     make -C /usr/src/linux-headers-`uname -r` SUBDIRS=`pwd` modules
> >>>>
> >>>> This requires that you have the kernel header, gcc and fiends installed
> >>>> on your system.
> >>>>
> >>>> Then you can load the modules as shown below, after unloading the
> >>>> original C_CAN modules:
> >>>>
> >>>>   $ sudo modprobe can_dev
> >>>>   $ rmmod c_can
> >>>>   $ rmmod c_can_pci
> >>>>   $ sudo insmod c_can.ko
> >>>>   $ sudo insmod c_can_pci.ko
> >>>>
> >>>> Hope it works as expected. 
> >>>
> >>> I got the modules built, however loading them results in a non-functional 
> > CAN 
> >>> interface.  can0 no longer shows up in "ifconfig" or "ip".  Upon loading 
the 
> >>> modules, dmesg prints the following:
> >>>
> >>> [ 2786.244341] c_can_pci 0000:02:0c.3: PCI INT C -> GSI 18 (level, low) -> 
> > IRQ 
> >>> 18
> >>> [ 2786.244381] c_can_pci 0000:02:0c.3: setting latency timer to 64
> >>> [ 2786.244497] c_can_pci 0000:02:0c.3: irq 41 for MSI/MSI-X
> >>> [ 2786.244523] c_can_pci 0000:02:0c.3: device has no PCI memory resources, 
> >>> failing adapter
> >>> [ 2786.244586] c_can_pci 0000:02:0c.3: PCI INT C disabled
> >>> [ 2786.244611] c_can_pci: probe of 0000:02:0c.3 failed with error -12
> >>
> >> "-ENOMEM", ah, the PCH uses PCI Bar 1 (instead of 0). I missed that.
> >> Could you please replace:
> >>
> >> 	addr = pci_iomap(pdev, 0, pci_resource_len(pdev, 0));
> >>
> >> with
> >>
> >> 	addr = pci_iomap(pdev, 1, pci_resource_len(pdev, 0));
> >>
> >> in c_can_pci and retry.
> >>
> >> Wolfgang.
> > 
> > I made the change, the module crashes on load now.  Dmesg output is:
> 
> Grrr, sorry, it should be :
> 
> 	addr = pci_iomap(pdev, 1, pci_resource_len(pdev, 1));
> 
> Wolfgang.

Made the change, same problem.  Dmesg output is:

[ 5015.918981] c_can_pci 0000:02:0c.3: PCI INT C -> GSI 18 (level, low) -> IRQ 
18
[ 5015.919047] c_can_pci 0000:02:0c.3: setting latency timer to 64
[ 5015.919328] c_can_pci 0000:02:0c.3: irq 41 for MSI/MSI-X
[ 5016.014291] BUG: unable to handle kernel paging request at f85a0538
[ 5016.014322] IP: [<c12bc410>] iowrite32+0x30/0x40
[ 5016.014360] *pdpt = 0000000001938001 *pde = 0000000035822067 *pte = 
8000000022ec8161
[ 5016.014391] Oops: 0003 [#1] SMP
[ 5016.014410] Modules linked in: c_can_pci(O) c_can(O) can_raw can i2c_isch 
snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi 
snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd ftdi_sio 
usbserial soundcore lpc_sch snd_page_alloc can_dev rfcomm bnep parport_pc 
bluetooth ppdev gpio_pch video mac_hid pch_phub shpchp lp parport sdhci_pci 
sdhci pch_gbe [last unloaded: pch_can]
[ 5016.014588]
[ 5016.014606] Pid: 3335, comm: ifconfig Tainted: G           O 3.2.0-23-
generic-pae #36-Ubuntu To be filled by O.E.M. To be filled by O.E.M./To be 
filled by O.E.M.
[ 5016.014642] EIP: 0060:[<c12bc410>] EFLAGS: 00010296 CPU: 0
[ 5016.014661] EIP is at iowrite32+0x30/0x40
[ 5016.014677] EAX: 00000001 EBX: f85a0538 ECX: f85a0538 EDX: f85a0538
[ 5016.014693] ESI: 00000000 EDI: 00000000 EBP: e2e09e5c ESP: e2e09e5c
[ 5016.014710]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 5016.014728] Process ifconfig (pid: 3335, ti=e2e08000 task=ec73cbc0 
task.ti=e2e08000)
[ 5016.014741] Stack:
[ 5016.014751]  e2e09e68 f859f0b1 f2729800 e2e09e88 f84cbbd6 00000080 f2729800 
f2729800
[ 5016.014798]  f2729800 f84cd220 00000000 e2e09ea4 c14ad8f3 c15a7e06 e2e09ea4 
c14ad85e
[ 5016.014843]  f2729800 000000c1 e2e09ecc c14adb62 000000c1 fffffff3 e2e09ec4 
00040080
[ 5016.014888] Call Trace:
[ 5016.014916]  [<f859f0b1>] c_can_pci_reset_pch+0x21/0x30 [c_can_pci]
[ 5016.014943]  [<f84cbbd6>] c_can_open+0xa6/0x100 [c_can]
[ 5016.014970]  [<c14ad8f3>] __dev_open+0x83/0xd0
[ 5016.014994]  [<c15a7e06>] ? _raw_spin_unlock_bh+0x16/0x20
[ 5016.015017]  [<c14ad85e>] ? dev_set_rx_mode+0x2e/0x40
[ 5016.015039]  [<c14adb62>] __dev_change_flags+0x82/0x150
[ 5016.015062]  [<c14adcd1>] dev_change_flags+0x21/0x60
[ 5016.015086]  [<c1504f6c>] devinet_ioctl+0x4dc/0x640
[ 5016.015107]  [<c14ae26e>] ? dev_ioctl+0x1de/0x2f0
[ 5016.015129]  [<c150551d>] inet_ioctl+0x8d/0xb0
[ 5016.015153]  [<c149887d>] sock_ioctl+0x6d/0x290
[ 5016.015175]  [<c1498810>] ? move_addr_to_user+0x90/0x90
[ 5016.015199]  [<c1154869>] do_vfs_ioctl+0x79/0x2d0
[ 5016.015221]  [<c15a8ea0>] ? do_debug+0x180/0x180
[ 5016.015244]  [<c10197cb>] ? fpu_finit+0x2b/0x70
[ 5016.015264]  [<c1154b2f>] sys_ioctl+0x6f/0x80
[ 5016.015288]  [<c1012474>] ? math_state_restore+0x44/0x60
[ 5016.015312]  [<c15af11f>] sysenter_do_call+0x12/0x28
[ 5016.015325] Code: 03 00 89 e5 89 d1 77 23 81 fa 00 00 01 00 76 0b 81 e2 ff ff 
00 00 ef 5d c3 66 90 ba d8 94 77 c1 89 c8 e8 44 ff ff ff 5d c3 66 90 <89> 02 5d 
c3 8d b6 00 00 00 00 8d bf 00 00 00 00 55 81 fa ff ff
[ 5016.015581] EIP: [<c12bc410>] iowrite32+0x30/0x40 SS:ESP 0068:e2e09e5c
[ 5016.015610] CR2: 00000000f85a0538
[ 5016.015627] ---[ end trace f89a7ceb8cce24a5 ]---




^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-19 17:39                     ` Michael Pellegrini
@ 2012-11-19 19:22                       ` Wolfgang Grandegger
  2012-11-19 20:19                         ` Michael Pellegrini
  0 siblings, 1 reply; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-11-19 19:22 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can

[-- Attachment #1: Type: text/plain, Size: 2905 bytes --]

On 11/19/2012 06:39 PM, Michael Pellegrini wrote:
> Wolfgang Grandegger <wg <at> grandegger.com> writes:
> 
>>
>> On 11/19/2012 05:20 PM, Michael Pellegrini wrote:
>>> Wolfgang Grandegger <wg <at> grandegger.com> writes:
>>>
>>>>
>>>> On 11/19/2012 04:10 PM, Michael Pellegrini wrote:
>>>>> Wolfgang Grandegger <wg <at> grandegger.com> writes:
>>>>>
>>>>>>
>>>>>> On 11/15/2012 10:34 PM, Michael Pellegrini wrote:
>>>>>>> Wolfgang Grandegger <wg <at> grandegger.com> writes:
>>>>>>>
>>>>>>>> I need to check first if this driver can be supported with little 
> effort.
>>>>>>>
>>>>>>> Thank you for your consideration.
>>>>>>
>>>>>> For simple out-of-tree build I have now extracted the most recent
>>>>>> version of c_can and c_can_pci driver files, backported it to v3.2 and
>>>>>> added the PCH related code.
>>>>>>
>>>>>> You should be able to build these drivers on your system as shown below:
>>>>>>
>>>>>>   $ tar xf c-can-pci-v1.tar.bz2
>>>>>>   $ cd c-can-pci-v1
>>>>>>   $ CONFIG_CAN_C_CAN_PCI=m \
>>>>>>     make -C /usr/src/linux-headers-`uname -r` SUBDIRS=`pwd` modules
>>>>>>
>>>>>> This requires that you have the kernel header, gcc and fiends installed
>>>>>> on your system.
>>>>>>
>>>>>> Then you can load the modules as shown below, after unloading the
>>>>>> original C_CAN modules:
>>>>>>
>>>>>>   $ sudo modprobe can_dev
>>>>>>   $ rmmod c_can
>>>>>>   $ rmmod c_can_pci
>>>>>>   $ sudo insmod c_can.ko
>>>>>>   $ sudo insmod c_can_pci.ko
>>>>>>
>>>>>> Hope it works as expected. 
>>>>>
>>>>> I got the modules built, however loading them results in a non-functional 
>>> CAN 
>>>>> interface.  can0 no longer shows up in "ifconfig" or "ip".  Upon loading 
> the 
>>>>> modules, dmesg prints the following:
>>>>>
>>>>> [ 2786.244341] c_can_pci 0000:02:0c.3: PCI INT C -> GSI 18 (level, low) -> 
>>> IRQ 
>>>>> 18
>>>>> [ 2786.244381] c_can_pci 0000:02:0c.3: setting latency timer to 64
>>>>> [ 2786.244497] c_can_pci 0000:02:0c.3: irq 41 for MSI/MSI-X
>>>>> [ 2786.244523] c_can_pci 0000:02:0c.3: device has no PCI memory resources, 
>>>>> failing adapter
>>>>> [ 2786.244586] c_can_pci 0000:02:0c.3: PCI INT C disabled
>>>>> [ 2786.244611] c_can_pci: probe of 0000:02:0c.3 failed with error -12
>>>>
>>>> "-ENOMEM", ah, the PCH uses PCI Bar 1 (instead of 0). I missed that.
>>>> Could you please replace:
>>>>
>>>> 	addr = pci_iomap(pdev, 0, pci_resource_len(pdev, 0));
>>>>
>>>> with
>>>>
>>>> 	addr = pci_iomap(pdev, 1, pci_resource_len(pdev, 0));
>>>>
>>>> in c_can_pci and retry.
>>>>
>>>> Wolfgang.
>>>
>>> I made the change, the module crashes on load now.  Dmesg output is:
>>
>> Grrr, sorry, it should be :
>>
>> 	addr = pci_iomap(pdev, 1, pci_resource_len(pdev, 1));
>>
>> Wolfgang.
> 
> Made the change, same problem.  Dmesg output is:

OK, priv->regs in c_can_pci_reset_pch() is wrong. It should be
priv->base. Next try. I have attached c-can-pci-v2.tar.bz2.

Wolfgang.


[-- Attachment #2: c-can-pci-v2.tar.bz2 --]
[-- Type: application/x-bzip, Size: 11986 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-19 19:22                       ` Wolfgang Grandegger
@ 2012-11-19 20:19                         ` Michael Pellegrini
  2012-11-19 21:46                           ` Wolfgang Grandegger
  0 siblings, 1 reply; 79+ messages in thread
From: Michael Pellegrini @ 2012-11-19 20:19 UTC (permalink / raw)
  To: linux-can

Wolfgang Grandegger <wg <at> grandegger.com> writes:

> 
> OK, priv->regs in c_can_pci_reset_pch() is wrong. It should be
> priv->base. Next try. I have attached c-can-pci-v2.tar.bz2.
> 
> Wolfgang.
> 
> 
> Attachment (c-can-pci-v2.tar.bz2): application/x-bzip, 11 KiB

My system really didn't like some part of the changes.  Once I loaded
c_can_pci.ko, I was locked out of all input, command line & graphical.
I had to hard-reset the system to recover.





^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-19 20:19                         ` Michael Pellegrini
@ 2012-11-19 21:46                           ` Wolfgang Grandegger
  2012-11-20 14:25                             ` Michael Pellegrini
  0 siblings, 1 reply; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-11-19 21:46 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can

[-- Attachment #1: Type: text/plain, Size: 850 bytes --]

On 11/19/2012 09:19 PM, Michael Pellegrini wrote:
> Wolfgang Grandegger <wg <at> grandegger.com> writes:
> 
>>
>> OK, priv->regs in c_can_pci_reset_pch() is wrong. It should be
>> priv->base. Next try. I have attached c-can-pci-v2.tar.bz2.
>>
>> Wolfgang.
>>
>>
>> Attachment (c-can-pci-v2.tar.bz2): application/x-bzip, 11 KiB
> 
> My system really didn't like some part of the changes.  Once I loaded
> c_can_pci.ko, I was locked out of all input, command line & graphical.
> I had to hard-reset the system to recover.

There is obviously still a problem when insmoding the driver. The only
significant difference vs. the pch_probe() is the line:

  pci_set_master(pdev);

which I have commented out. Furthermore I fixed another problem with the
register access (the shift was wrong). Please retry with the attached
c-can-pci-v3.tar.bz2.

Wolfgang.

[-- Attachment #2: c-can-pci-v3.tar.bz2 --]
[-- Type: application/x-bzip, Size: 12019 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-19 21:46                           ` Wolfgang Grandegger
@ 2012-11-20 14:25                             ` Michael Pellegrini
  2012-11-20 16:12                               ` Wolfgang Grandegger
  0 siblings, 1 reply; 79+ messages in thread
From: Michael Pellegrini @ 2012-11-20 14:25 UTC (permalink / raw)
  To: linux-can

Wolfgang Grandegger <wg <at> grandegger.com> writes:

> 
> There is obviously still a problem when insmoding the driver. The only
> significant difference vs. the pch_probe() is the line:
> 
>   pci_set_master(pdev);
> 
> which I have commented out. Furthermore I fixed another problem with the
> register access (the shift was wrong). Please retry with the attached
> c-can-pci-v3.tar.bz2.
> 
> Wolfgang.
> 
> Attachment (c-can-pci-v3.tar.bz2): application/x-bzip, 11 KiB

The new driver loads!  After loading the driver, I am able to see the CAN 
interface in ifconfig.  Dmesg output after loading is:

[ 4285.596420] c_can_pci 0000:02:0c.3: PCI INT C -> GSI 18 (level, low) -> IRQ 
18
[ 4285.597107] c_can_pci 0000:02:0c.3: irq 41 for MSI/MSI-X
[ 4285.597135] c_can_pci_probe: bar=1
[ 4285.597191] c_can_pci_probe: addr=f8450000
[ 4285.685151] c_can_pci_reset_pch: addr=f84501fc
[ 4285.685332] c_can_pci 0000:02:0c.3: can0: setting BTR=0518 BRPE=0000

However, everything isn't fixed yet.  After sending a handful of messages, the 
CAN interface hits the same problem that started this thread.  My application 
prints the error "No buffer space available" and the only way to recover is an 
interface reset via "ifconfig can0 down" followed by "ifconfig can0 up".

Is there any information you need me to capture to assist you in debugging?

- Mike


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-20 14:25                             ` Michael Pellegrini
@ 2012-11-20 16:12                               ` Wolfgang Grandegger
  2012-11-20 19:12                                 ` Michael Pellegrini
  0 siblings, 1 reply; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-11-20 16:12 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can

On 11/20/2012 03:25 PM, Michael Pellegrini wrote:
> Wolfgang Grandegger <wg <at> grandegger.com> writes:
> 
>>
>> There is obviously still a problem when insmoding the driver. The only
>> significant difference vs. the pch_probe() is the line:
>>
>>   pci_set_master(pdev);
>>
>> which I have commented out. Furthermore I fixed another problem with the
>> register access (the shift was wrong). Please retry with the attached
>> c-can-pci-v3.tar.bz2.
>>
>> Wolfgang.
>>
>> Attachment (c-can-pci-v3.tar.bz2): application/x-bzip, 11 KiB
> 
> The new driver loads!  After loading the driver, I am able to see the CAN 
> interface in ifconfig.  Dmesg output after loading is:

Good news. Thanks for your patience.

> [ 4285.596420] c_can_pci 0000:02:0c.3: PCI INT C -> GSI 18 (level, low) -> IRQ 
> 18
> [ 4285.597107] c_can_pci 0000:02:0c.3: irq 41 for MSI/MSI-X
> [ 4285.597135] c_can_pci_probe: bar=1
> [ 4285.597191] c_can_pci_probe: addr=f8450000
> [ 4285.685151] c_can_pci_reset_pch: addr=f84501fc
> [ 4285.685332] c_can_pci 0000:02:0c.3: can0: setting BTR=0518 BRPE=0000
> 
> However, everything isn't fixed yet.  After sending a handful of messages, the 
> CAN interface hits the same problem that started this thread.  My application 
> prints the error "No buffer space available" and the only way to recover is an 
> interface reset via "ifconfig can0 down" followed by "ifconfig can0 up".

Could you please send the output of "candump any,0:0,#FFFFFFFF" while
sending the messages. Do they go out to the bus? Also "ip -d -s link
show can0" would be useful.

> Is there any information you need me to capture to assist you in debugging?

See above, for today.

Wolfgang.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-15 16:12   ` Michael Pellegrini
@ 2012-11-20 18:59     ` Wolfgang Grandegger
  0 siblings, 0 replies; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-11-20 18:59 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can

Looking to your previous messages...

On 11/15/2012 05:12 PM, Michael Pellegrini wrote:
> Oliver Hartkopp <socketcan <at> hartkopp.net> writes:
> 
>>
>> On 14.11.2012 16:39, Michael Pellegrini wrote:
>>
>>> I am trying to use the pch_can driver to interface with a CANopen system.
>>> I am encountering a serious issue where, at random intervals, the CAN 
> interface
>>> will lock up and stop transmitting data.  Data reception continues normally.
>>> The only way I've found to resolve the issue so far is to issue an
>>> "ifconfig canx down" command followed by an "ifconfig canx up" command.
>>> After the interface is reset through these commands, the TX packet dropped 
> count
>>> and the TX packet carrier count increment by 1.
>>>
>>> I am running Ubuntu 12.04 with kernel version 3.2.0-23.
>>>
>>> Has anyone else seen this problem or have a fix for it?
>>
>> Hello Mike,
>>
>> i put the relevant people into CC that probably know better than me.
>>
>> Btw. using the pch_can driver:
>>
>> 1. Can you please provide the output of 
>>
>> 	ip -details link show can0
>>
>> (if can0 is your device with the problem)
>> at start time and when the problem happened?
> 
> At Start Time:
> 
> 3: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
>     link/can
>     can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
>     bitrate 250000 sample-point 0.875
>     tq 500 prop-seg 3 phase-seg1 3 phase-seg2 1 sjw 1
>     pch_can: tseg1 2..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
>     clock 50000000
> 
> After Problem Happens:
> 
> 3: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
>     link/can
>     can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
>     bitrate 250000 sample-point 0.875
>     tq 500 prop-seg 3 phase-seg1 3 phase-seg2 1 sjw 1
>     pch_can: tseg1 2..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
>     clock 50000000
> 
> I then reset the interface via "ifconfig can0 down" and "ifconfig can0 up".
> 
> Output is:
> 
> 3: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
>     link/can
>     can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
>     bitrate 250000 sample-point 0.875
>     tq 500 prop-seg 3 phase-seg1 3 phase-seg2 1 sjw 1
>     pch_can: tseg1 2..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
>     clock 50000000
> 
> Output of "ifconfig can0" is:
> 
> can0      Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-
> 00-00
>           UP RUNNING NOARP  MTU:16  Metric:1
>           RX packets:1085 errors:0 dropped:769 overruns:0 frame:0
>           TX packets:2039 errors:0 dropped:1 overruns:0 carrier:1
>           collisions:0 txqueuelen:10
>           RX bytes:1085 (1.0 KB)  TX bytes:10467 (10.4 KB)
>           Interrupt:18
> 
> Notice that the "dropped" and "carrier" count incremented by 1.

This means that the device was closed with pending TX messages.
Actually, the TX done did not come. Somehow the message did not go out
to the bus.

Wolfgang.



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-20 16:12                               ` Wolfgang Grandegger
@ 2012-11-20 19:12                                 ` Michael Pellegrini
  2012-11-20 21:05                                   ` Wolfgang Grandegger
  0 siblings, 1 reply; 79+ messages in thread
From: Michael Pellegrini @ 2012-11-20 19:12 UTC (permalink / raw)
  To: linux-can

Wolfgang Grandegger <wg <at> grandegger.com> writes:

> > 
> > However, everything isn't fixed yet.  After sending a handful of messages, 
the 
> > CAN interface hits the same problem that started this thread.  My 
application 
> > prints the error "No buffer space available" and the only way to recover is 
an 
> > interface reset via "ifconfig can0 down" followed by "ifconfig can0 up".
> 
> Could you please send the output of "candump any,0:0,#FFFFFFFF" while
> sending the messages. Do they go out to the bus? Also "ip -d -s link
> show can0" would be useful.

candump produced no output while my application was running and trying to send
messages.  Furthermore, the lack of output from candump suggests that messages
aren't being received as well.  There is another CAN device on the bus and I can
confirm, using a CAN monitor tool on another PC, that it is actively sending
messages.

Upon starting my application, 16 messages are transmitted on the bus (confirmed
via the external CAN monitor tool) which candump does not report.  Then
transmission stops.  This lines up with the output of "ip -d -s link show can0",
shown below:

At startup:

4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
    link/can
    can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
    bitrate 250000 sample-point 0.875
    tq 500 prop-seg 3 phase-seg1 3 phase-seg2 1 sjw 1
    c_can: tseg1 2..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
    clock 50000000
    re-started bus-errors arbit-lost error-warn error-pass bus-off
    0          0          0          0          0          0
    RX: bytes  packets  errors  dropped overrun mcast
    0          0        0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    0          0        0       0       0       0

After transmission stops:

4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
    link/can
    can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
    bitrate 250000 sample-point 0.875
    tq 500 prop-seg 3 phase-seg1 3 phase-seg2 1 sjw 1
    c_can: tseg1 2..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
    clock 50000000
    re-started bus-errors arbit-lost error-warn error-pass bus-off
    0          0          0          0          0          0
    RX: bytes  packets  errors  dropped overrun mcast
    0          0        0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    0          0        0       0       0       0

After transmission stops & interface is reset:

4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
    link/can
    can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
    bitrate 250000 sample-point 0.875
    tq 500 prop-seg 3 phase-seg1 3 phase-seg2 1 sjw 1
    c_can: tseg1 2..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
    clock 50000000
    re-started bus-errors arbit-lost error-warn error-pass bus-off
    0          0          0          0          0          0
    RX: bytes  packets  errors  dropped overrun mcast
    0          0        0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    0          0        0       16      0       0

The best I can tell, the 16 messages which are successfully transmitted on the
bus are considered dropped by the driver.

- Mike



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-20 19:12                                 ` Michael Pellegrini
@ 2012-11-20 21:05                                   ` Wolfgang Grandegger
  2012-11-21 10:24                                     ` Wolfgang Grandegger
  2012-11-21 14:52                                     ` Michael Pellegrini
  0 siblings, 2 replies; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-11-20 21:05 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can

Hi Michael,

On 11/20/2012 08:12 PM, Michael Pellegrini wrote:
> Wolfgang Grandegger <wg <at> grandegger.com> writes:
> 
>>>
>>> However, everything isn't fixed yet.  After sending a handful of messages, 
> the 
>>> CAN interface hits the same problem that started this thread.  My 
> application 
>>> prints the error "No buffer space available" and the only way to recover is 
> an 
>>> interface reset via "ifconfig can0 down" followed by "ifconfig can0 up".
>>
>> Could you please send the output of "candump any,0:0,#FFFFFFFF" while
>> sending the messages. Do they go out to the bus? Also "ip -d -s link
>> show can0" would be useful.
> 
> candump produced no output while my application was running and trying to send
> messages.  Furthermore, the lack of output from candump suggests that messages
> aren't being received as well.  There is another CAN device on the bus and I can
> confirm, using a CAN monitor tool on another PC, that it is actively sending
> messages.
> 
> Upon starting my application, 16 messages are transmitted on the bus (confirmed
> via the external CAN monitor tool) which candump does not report.  Then
> transmission stops.  This lines up with the output of "ip -d -s link show can0",
> shown below:

I'm a bit confused. Where is the application running? On this
PCH-System? Or does an external node send messages, which are not
received by the PCH-C_CAN?

> At startup:
> 
> 4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
>     link/can
>     can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
>     bitrate 250000 sample-point 0.875
>     tq 500 prop-seg 3 phase-seg1 3 phase-seg2 1 sjw 1
>     c_can: tseg1 2..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
>     clock 50000000
>     re-started bus-errors arbit-lost error-warn error-pass bus-off
>     0          0          0          0          0          0
>     RX: bytes  packets  errors  dropped overrun mcast
>     0          0        0       0       0       0
>     TX: bytes  packets  errors  dropped carrier collsns
>     0          0        0       0       0       0
> 
> After transmission stops:
> 
> 4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
>     link/can
>     can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
>     bitrate 250000 sample-point 0.875
>     tq 500 prop-seg 3 phase-seg1 3 phase-seg2 1 sjw 1
>     c_can: tseg1 2..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
>     clock 50000000
>     re-started bus-errors arbit-lost error-warn error-pass bus-off
>     0          0          0          0          0          0
>     RX: bytes  packets  errors  dropped overrun mcast
>     0          0        0       0       0       0
>     TX: bytes  packets  errors  dropped carrier collsns
>     0          0        0       0       0       0
> 
> After transmission stops & interface is reset:
> 
> 4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
>     link/can
>     can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
>     bitrate 250000 sample-point 0.875
>     tq 500 prop-seg 3 phase-seg1 3 phase-seg2 1 sjw 1
>     c_can: tseg1 2..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
>     clock 50000000
>     re-started bus-errors arbit-lost error-warn error-pass bus-off
>     0          0          0          0          0          0
>     RX: bytes  packets  errors  dropped overrun mcast
>     0          0        0       0       0       0
>     TX: bytes  packets  errors  dropped carrier collsns
>     0          0        0       16      0       0
> 
> The best I can tell, the 16 messages which are successfully transmitted on the
> bus are considered dropped by the driver.

OK, we still have a problem with TX and RX, most likely with interrupts.
Could you please send a message on the PCH-System executing
"cansend can0 123#abcdef":

- Does "candump any,0:0,#FFFFFFFF" report anything.

- Does the message show up on the bus (visiable to another node or
  analyser)?

- Does the interrupt count increase? Check output of /proc/interrupts.

I will prepare an instrumented driver tomorrow.

Wolfgang.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-20 21:05                                   ` Wolfgang Grandegger
@ 2012-11-21 10:24                                     ` Wolfgang Grandegger
       [not found]                                       ` <loom.20121121T160744-278@post.gmane.or  g>
                                                         ` (2 more replies)
  2012-11-21 14:52                                     ` Michael Pellegrini
  1 sibling, 3 replies; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-11-21 10:24 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can

[-- Attachment #1: Type: text/plain, Size: 4453 bytes --]

On 11/20/2012 10:05 PM, Wolfgang Grandegger wrote:
> Hi Michael,
> 
> On 11/20/2012 08:12 PM, Michael Pellegrini wrote:
>> Wolfgang Grandegger <wg <at> grandegger.com> writes:
>>
>>>>
>>>> However, everything isn't fixed yet.  After sending a handful of messages, 
>> the 
>>>> CAN interface hits the same problem that started this thread.  My 
>> application 
>>>> prints the error "No buffer space available" and the only way to recover is 
>> an 
>>>> interface reset via "ifconfig can0 down" followed by "ifconfig can0 up".
>>>
>>> Could you please send the output of "candump any,0:0,#FFFFFFFF" while
>>> sending the messages. Do they go out to the bus? Also "ip -d -s link
>>> show can0" would be useful.
>>
>> candump produced no output while my application was running and trying to send
>> messages.  Furthermore, the lack of output from candump suggests that messages
>> aren't being received as well.  There is another CAN device on the bus and I can
>> confirm, using a CAN monitor tool on another PC, that it is actively sending
>> messages.
>>
>> Upon starting my application, 16 messages are transmitted on the bus (confirmed
>> via the external CAN monitor tool) which candump does not report.  Then
>> transmission stops.  This lines up with the output of "ip -d -s link show can0",
>> shown below:
> 
> I'm a bit confused. Where is the application running? On this
> PCH-System? Or does an external node send messages, which are not
> received by the PCH-C_CAN?
> 
>> At startup:
>>
>> 4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
>>     link/can
>>     can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
>>     bitrate 250000 sample-point 0.875
>>     tq 500 prop-seg 3 phase-seg1 3 phase-seg2 1 sjw 1
>>     c_can: tseg1 2..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
>>     clock 50000000
>>     re-started bus-errors arbit-lost error-warn error-pass bus-off
>>     0          0          0          0          0          0
>>     RX: bytes  packets  errors  dropped overrun mcast
>>     0          0        0       0       0       0
>>     TX: bytes  packets  errors  dropped carrier collsns
>>     0          0        0       0       0       0
>>
>> After transmission stops:
>>
>> 4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
>>     link/can
>>     can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
>>     bitrate 250000 sample-point 0.875
>>     tq 500 prop-seg 3 phase-seg1 3 phase-seg2 1 sjw 1
>>     c_can: tseg1 2..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
>>     clock 50000000
>>     re-started bus-errors arbit-lost error-warn error-pass bus-off
>>     0          0          0          0          0          0
>>     RX: bytes  packets  errors  dropped overrun mcast
>>     0          0        0       0       0       0
>>     TX: bytes  packets  errors  dropped carrier collsns
>>     0          0        0       0       0       0
>>
>> After transmission stops & interface is reset:
>>
>> 4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
>>     link/can
>>     can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
>>     bitrate 250000 sample-point 0.875
>>     tq 500 prop-seg 3 phase-seg1 3 phase-seg2 1 sjw 1
>>     c_can: tseg1 2..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
>>     clock 50000000
>>     re-started bus-errors arbit-lost error-warn error-pass bus-off
>>     0          0          0          0          0          0
>>     RX: bytes  packets  errors  dropped overrun mcast
>>     0          0        0       0       0       0
>>     TX: bytes  packets  errors  dropped carrier collsns
>>     0          0        0       16      0       0
>>
>> The best I can tell, the 16 messages which are successfully transmitted on the
>> bus are considered dropped by the driver.
> 
> OK, we still have a problem with TX and RX, most likely with interrupts.
> Could you please send a message on the PCH-System executing
> "cansend can0 123#abcdef":
> 
> - Does "candump any,0:0,#FFFFFFFF" report anything.
> 
> - Does the message show up on the bus (visiable to another node or
>   analyser)?
> 
> - Does the interrupt count increase? Check output of /proc/interrupts.
> 
> I will prepare an instrumented driver tomorrow.

Could you please try the attached c-can-pci-v4.tar.bz2. I'm puzzled why
MSI does not get enabled. What does "lspci -vv" list for that device?

Wolfgang.



[-- Attachment #2: c-can-pci-v4.tar.bz2 --]
[-- Type: application/x-bzip, Size: 12142 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-20 21:05                                   ` Wolfgang Grandegger
  2012-11-21 10:24                                     ` Wolfgang Grandegger
@ 2012-11-21 14:52                                     ` Michael Pellegrini
  2012-11-21 15:02                                       ` Wolfgang Grandegger
  1 sibling, 1 reply; 79+ messages in thread
From: Michael Pellegrini @ 2012-11-21 14:52 UTC (permalink / raw)
  To: linux-can

Wolfgang Grandegger <wg <at> grandegger.com> writes:

> I'm a bit confused. Where is the application running? On this
> PCH-System? Or does an external node send messages, which are not
> received by the PCH-C_CAN?

The application is running on the PCH-System.  Here are the systems on the bus:

- PCH-System: Runs the application which sends the majority of CAN messages.
- External Node: Sends periodic CAN messages.
- CAN Monitor: Monitors all bus traffic.

The PCH-System sends 16 CAN messages successfully before transmission stops.
The CAN Monitor system successfully receives these messages.
As shown by the "ip" command output, the driver considers these messages
dropped, not transmitted.
The External Node system sends periodic messages, which I can see on the CAN
Monitor system.  The PCH-System does not receive these messages.

> OK, we still have a problem with TX and RX, most likely with interrupts.
> Could you please send a message on the PCH-System executing
> "cansend can0 123#abcdef":
> 
> - Does "candump any,0:0,#FFFFFFFF" report anything.

No, candump reports nothing.

> - Does the message show up on the bus (visiable to another node or
>   analyser)?

Yes, the message shows up on the bus.  The CAN Monitor system receives
the message.

> - Does the interrupt count increase? Check output of /proc/interrupts.

I'm not sure exactly which piece of data you're looking for, so here's
the entirety of /proc/interrupts:

           CPU0       CPU1
  0:   68369230          0   IO-APIC-edge      timer
  8:          1          0   IO-APIC-edge      rtc0
  9:          0          0   IO-APIC-fasteoi   acpi
 16:      27694          0   IO-APIC-fasteoi   PCIe PME, PCIe PME, PCIe PME, PCI
e PME, ehci_hcd:usb2, ohci_hcd:usb6, ohci_hcd:usb7, ohci_hcd:usb8, gpio_pch, snd
_hda_intel
 18:          0          0   IO-APIC-fasteoi   mmc0, mmc1
 19:      22916          0   IO-APIC-fasteoi   ehci_hcd:usb1, ohci_hcd:usb3, ohc
i_hcd:usb4, ohci_hcd:usb5
 40:      83612          0   PCI-MSI-edge      ahci
 41:          0          0   PCI-MSI-edge      can0
 42:     176385          0   PCI-MSI-edge      eth0
NMI:       4569       4569   Non-maskable interrupts
LOC:    7625157   47016568   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:       4569       4569   Performance monitoring interrupts
IWI:          0          0   IRQ work interrupts
RES:    4979140    4962335   Rescheduling interrupts
CAL:        137        308   Function call interrupts
TLB:       2670       2578   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:        211        211   Machine check polls
ERR:          0
MIS:          0

Note that this test was run with v3 of the driver.

- Mike



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-21 14:52                                     ` Michael Pellegrini
@ 2012-11-21 15:02                                       ` Wolfgang Grandegger
  0 siblings, 0 replies; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-11-21 15:02 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can

On 11/21/2012 03:52 PM, Michael Pellegrini wrote:
> Wolfgang Grandegger <wg <at> grandegger.com> writes:
> 
>> I'm a bit confused. Where is the application running? On this
>> PCH-System? Or does an external node send messages, which are not
>> received by the PCH-C_CAN?
> 
> The application is running on the PCH-System.  Here are the systems on the bus:
> 
> - PCH-System: Runs the application which sends the majority of CAN messages.
> - External Node: Sends periodic CAN messages.
> - CAN Monitor: Monitors all bus traffic.
> 
> The PCH-System sends 16 CAN messages successfully before transmission stops.
> The CAN Monitor system successfully receives these messages.
> As shown by the "ip" command output, the driver considers these messages
> dropped, not transmitted.
> The External Node system sends periodic messages, which I can see on the CAN
> Monitor system.  The PCH-System does not receive these messages.
> 
>> OK, we still have a problem with TX and RX, most likely with interrupts.
>> Could you please send a message on the PCH-System executing
>> "cansend can0 123#abcdef":
>>
>> - Does "candump any,0:0,#FFFFFFFF" report anything.
> 
> No, candump reports nothing.

OK, normally you should see here the loopback of the transmitted
messages. But the loopback does not happen because the TX done interrupt
does not occur.

> 
>> - Does the message show up on the bus (visiable to another node or
>>   analyser)?
> 
> Yes, the message shows up on the bus.  The CAN Monitor system receives
> the message.

OK.

>> - Does the interrupt count increase? Check output of /proc/interrupts.
> 
> I'm not sure exactly which piece of data you're looking for, so here's
> the entirety of /proc/interrupts:
> 
>            CPU0       CPU1
>   0:   68369230          0   IO-APIC-edge      timer
>   8:          1          0   IO-APIC-edge      rtc0
>   9:          0          0   IO-APIC-fasteoi   acpi
>  16:      27694          0   IO-APIC-fasteoi   PCIe PME, PCIe PME, PCIe PME, PCI
> e PME, ehci_hcd:usb2, ohci_hcd:usb6, ohci_hcd:usb7, ohci_hcd:usb8, gpio_pch, snd
> _hda_intel
>  18:          0          0   IO-APIC-fasteoi   mmc0, mmc1
>  19:      22916          0   IO-APIC-fasteoi   ehci_hcd:usb1, ohci_hcd:usb3, ohc
> i_hcd:usb4, ohci_hcd:usb5
>  40:      83612          0   PCI-MSI-edge      ahci
>  41:          0          0   PCI-MSI-edge      can0

We do *not* yet get any interrupt and that's why TX and RX does not yet
work.

> Note that this test was run with v3 of the driver.

Ok, could you please try v4. Thanks.

Wolfgang.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-21 10:24                                     ` Wolfgang Grandegger
       [not found]                                       ` <loom.20121121T160744-278@post.gmane.or  g>
@ 2012-11-21 15:15                                       ` Michael Pellegrini
       [not found]                                       ` <loom.20121121T160744-278@post.gmane.or g>
  2 siblings, 0 replies; 79+ messages in thread
From: Michael Pellegrini @ 2012-11-21 15:15 UTC (permalink / raw)
  To: linux-can

Wolfgang Grandegger <wg <at> grandegger.com> writes:

> Could you please try the attached c-can-pci-v4.tar.bz2. I'm puzzled why
> MSI does not get enabled. What does "lspci -vv" list for that device?

"lspci -vv" lists:

02:0c.3 CANBUS: Intel Corporation Platform Controller Hub EG20T Controller Area 
Network (CAN) Controller
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
<MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin C routed to IRQ 41
        Region 1: Memory at d0143000 (32-bit, non-prefetchable) [size=512]
        Capabilities: [40] MSI: Enable+ Count=1/1 Maskable- 64bit-
                Address: fee0300c  Data: 4179
        Capabilities: [50] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-
,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Kernel driver in use: c_can_pci
        Kernel modules: pch_can_test, pch_can

This driver is working very well so far.  ifconfig shows that I am sending and
receiving CAN messages successfully.  This is confirmed externally by the CAN
Monitor system.

I will start a more intensive test now and report the results.

- Mike


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
       [not found]                                       ` <loom.20121121T160744-278@post.gmane.or g>
@ 2012-11-21 15:25                                         ` Michael Pellegrini
  2012-11-21 15:32                                           ` Marc Kleine-Budde
                                                             ` (2 more replies)
  0 siblings, 3 replies; 79+ messages in thread
From: Michael Pellegrini @ 2012-11-21 15:25 UTC (permalink / raw)
  To: linux-can

Michael Pellegrini <mikep86 <at> gmail.com> writes:

> This driver is working very well so far.  ifconfig shows that I am sending and
> receiving CAN messages successfully.  This is confirmed externally by the CAN
> Monitor system.
> 
> I will start a more intensive test now and report the results.

I shouldn't have spoken so soon.  Running my application on the PCH-System
caused the original problem within minutes.  Message transmission stopped
and and my application started printing the error "No buffer space available".
The only way to recover was to reset the interface.

- Mike




^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-21 15:25                                         ` Michael Pellegrini
@ 2012-11-21 15:32                                           ` Marc Kleine-Budde
  2012-11-21 16:11                                             ` Michael Pellegrini
  2012-11-21 15:41                                           ` Michael Pellegrini
  2012-11-21 15:56                                           ` Wolfgang Grandegger
  2 siblings, 1 reply; 79+ messages in thread
From: Marc Kleine-Budde @ 2012-11-21 15:32 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can

[-- Attachment #1: Type: text/plain, Size: 1862 bytes --]

On 11/21/2012 04:25 PM, Michael Pellegrini wrote:
> Michael Pellegrini <mikep86 <at> gmail.com> writes:
> 
>> This driver is working very well so far.  ifconfig shows that I am sending and
>> receiving CAN messages successfully.  This is confirmed externally by the CAN
>> Monitor system.
>>
>> I will start a more intensive test now and report the results.
> 
> I shouldn't have spoken so soon.  Running my application on the PCH-System
> caused the original problem within minutes.  Message transmission stopped
> and and my application started printing the error "No buffer space available".
> The only way to recover was to reset the interface.

Can you try this patch (ontop of Wolfgang's)?

Marc

diff --git a/drivers/net/can/c_can/c_can.c b/drivers/net/can/c_can/c_can.c
index 5233b8f..2849224 100644
--- a/drivers/net/can/c_can/c_can.c
+++ b/drivers/net/can/c_can/c_can.c
@@ -543,8 +543,6 @@ static netdev_tx_t c_can_start_xmit(struct sk_buff *skb,

        msg_obj_no = get_tx_next_msg_obj(priv);

-       /* prepare message object for transmission */
-       c_can_write_msg_object(dev, 0, frame, msg_obj_no);
        can_put_echo_skb(skb, dev, msg_obj_no - C_CAN_MSG_OBJ_TX_FIRST);

        /*
@@ -556,6 +554,9 @@ static netdev_tx_t c_can_start_xmit(struct sk_buff *skb,
                        (priv->tx_next & C_CAN_NEXT_MSG_OBJ_MASK) == 0)
                netif_stop_queue(dev);

+       /* prepare message object for transmission */
+       c_can_write_msg_object(dev, 0, frame, msg_obj_no);
+
        return NETDEV_TX_OK;
 }



-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 261 bytes --]

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-21 15:25                                         ` Michael Pellegrini
  2012-11-21 15:32                                           ` Marc Kleine-Budde
@ 2012-11-21 15:41                                           ` Michael Pellegrini
  2012-11-21 15:56                                           ` Wolfgang Grandegger
  2 siblings, 0 replies; 79+ messages in thread
From: Michael Pellegrini @ 2012-11-21 15:41 UTC (permalink / raw)
  To: linux-can

Michael Pellegrini <mikep86 <at> gmail.com> writes:

> I shouldn't have spoken so soon.  Running my application on the PCH-System
> caused the original problem within minutes.  Message transmission stopped
> and and my application started printing the error "No buffer space available".
> The only way to recover was to reset the interface.

Something of note: When I encounter the message transmission problem,
dmesg prints the message "c_can_isr: irqstatus=0xXX" continuously, with the
same value for irqstatus.  So far, I have seen irqstatus=0x17 and
irqstatus=0x19.

- Mike



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-21 15:25                                         ` Michael Pellegrini
  2012-11-21 15:32                                           ` Marc Kleine-Budde
  2012-11-21 15:41                                           ` Michael Pellegrini
@ 2012-11-21 15:56                                           ` Wolfgang Grandegger
  2012-11-21 16:09                                             ` Michael Pellegrini
  2 siblings, 1 reply; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-11-21 15:56 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can

On 11/21/2012 04:25 PM, Michael Pellegrini wrote:
> Michael Pellegrini <mikep86 <at> gmail.com> writes:
> 
>> This driver is working very well so far.  ifconfig shows that I am sending and
>> receiving CAN messages successfully.  This is confirmed externally by the CAN
>> Monitor system.
>>
>> I will start a more intensive test now and report the results.
> 
> I shouldn't have spoken so soon.  Running my application on the PCH-System
> caused the original problem within minutes.  Message transmission stopped
> and and my application started printing the error "No buffer space available".
> The only way to recover was to reset the interface.

OK, at least we have reached the level of the PHC_CAN driver. The driver
does print some debug messages. Could you show them?

Wolfgang.



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-21 15:56                                           ` Wolfgang Grandegger
@ 2012-11-21 16:09                                             ` Michael Pellegrini
  2012-11-21 16:41                                               ` Wolfgang Grandegger
  0 siblings, 1 reply; 79+ messages in thread
From: Michael Pellegrini @ 2012-11-21 16:09 UTC (permalink / raw)
  To: linux-can

Wolfgang Grandegger <wg <at> grandegger.com> writes:

> OK, at least we have reached the level of the PHC_CAN driver. The driver
> does print some debug messages. Could you show them?

Do you mean the dmesg output?  Dmesg output during the transmission problem is
an uninterrupted stream of "c_can_isr: irqstatus=0xXX", where XX is constant.
So far, I have seen the values 0x17, 0x19, and 0x1e in the three times I have
hit the transmission problem.

- Mike



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-21 15:32                                           ` Marc Kleine-Budde
@ 2012-11-21 16:11                                             ` Michael Pellegrini
  0 siblings, 0 replies; 79+ messages in thread
From: Michael Pellegrini @ 2012-11-21 16:11 UTC (permalink / raw)
  To: linux-can

Marc Kleine-Budde <mkl <at> pengutronix.de> writes:

> Can you try this patch (ontop of Wolfgang's)?

Applied the patch and the problem still occurs.

- Mike


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-21 16:09                                             ` Michael Pellegrini
@ 2012-11-21 16:41                                               ` Wolfgang Grandegger
  2012-11-21 16:58                                                 ` Casper Mogensen
  2012-11-21 17:43                                                 ` Michael Pellegrini
  0 siblings, 2 replies; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-11-21 16:41 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can

On 11/21/2012 05:09 PM, Michael Pellegrini wrote:
> Wolfgang Grandegger <wg <at> grandegger.com> writes:
> 
>> OK, at least we have reached the level of the PHC_CAN driver. The driver
>> does print some debug messages. Could you show them?
> 
> Do you mean the dmesg output?  Dmesg output during the transmission problem is
> an uninterrupted stream of "c_can_isr: irqstatus=0xXX", where XX is constant.
> So far, I have seen the values 0x17, 0x19, and 0x1e in the three times I have
> hit the transmission problem.

Could you show the sequence? I need to lookup the meaning of the bits.

Wolfgang.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-21 16:41                                               ` Wolfgang Grandegger
@ 2012-11-21 16:58                                                 ` Casper Mogensen
  2012-11-21 19:48                                                   ` Wolfgang Grandegger
  2012-11-21 17:43                                                 ` Michael Pellegrini
  1 sibling, 1 reply; 79+ messages in thread
From: Casper Mogensen @ 2012-11-21 16:58 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: Michael Pellegrini, linux-can

Hi again

Just took a quick look in the C_CAN driver  and from what i can see,
it suffers from the same problem, as i described earlier with the
pch_can.
There is no locking what so ever, but the xmit function and the
can_do_tx uses same place in ram buffer, and uses same procedure as in
pch_can with setting registers and then writing them at the end of
both functions.

Regards

Casper Mogensen

On Wed, Nov 21, 2012 at 5:41 PM, Wolfgang Grandegger <wg@grandegger.com> wrote:
> On 11/21/2012 05:09 PM, Michael Pellegrini wrote:
>> Wolfgang Grandegger <wg <at> grandegger.com> writes:
>>
>>> OK, at least we have reached the level of the PHC_CAN driver. The driver
>>> does print some debug messages. Could you show them?
>>
>> Do you mean the dmesg output?  Dmesg output during the transmission problem is
>> an uninterrupted stream of "c_can_isr: irqstatus=0xXX", where XX is constant.
>> So far, I have seen the values 0x17, 0x19, and 0x1e in the three times I have
>> hit the transmission problem.
>
> Could you show the sequence? I need to lookup the meaning of the bits.
>
> Wolfgang.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Regards / Med venlig hilsen

Casper Mogensen

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-21 16:41                                               ` Wolfgang Grandegger
  2012-11-21 16:58                                                 ` Casper Mogensen
@ 2012-11-21 17:43                                                 ` Michael Pellegrini
  2012-11-21 19:55                                                   ` Wolfgang Grandegger
  1 sibling, 1 reply; 79+ messages in thread
From: Michael Pellegrini @ 2012-11-21 17:43 UTC (permalink / raw)
  To: linux-can

Wolfgang Grandegger <wg <at> grandegger.com> writes:

> Could you show the sequence? I need to lookup the meaning of the bits.

I tried to capture the transition from functional to non-functional transmission
in dmesg, but the data gets logged too quickly.  Here's the best I can do:

Functional transmission:

[74734.413300] c_can_isr: irqstatus=0x15
[74734.441193] c_can_isr: irqstatus=0x8000
[74734.441265] c_can_isr: irqstatus=0x16
[74734.451140] c_can_isr: irqstatus=0x8000
[74734.451196] c_can_isr: irqstatus=0x17
[74734.461468] c_can_isr: irqstatus=0x8000
[74734.461523] c_can_isr: irqstatus=0x18
[74734.470794] c_can_isr: irqstatus=0x8000
[74734.470846] c_can_isr: irqstatus=0x19
[74734.488507] c_can_isr: irqstatus=0x8000
[74734.488567] c_can_isr: irqstatus=0x1a
[74734.500074] c_can_isr: irqstatus=0x8000
[74734.500127] c_can_isr: irqstatus=0x1b
[74734.515078] c_can_isr: irqstatus=0x8000
[74734.515133] c_can_isr: irqstatus=0x1c
[74734.543469] c_can_isr: irqstatus=0x8000
[74734.543509] c_can_isr: irqstatus=0x1d
[74734.553434] c_can_isr: irqstatus=0x8000
[74734.553484] c_can_isr: irqstatus=0x1e

Non-functional transmission:

[74553.334110] c_can_isr: irqstatus=0x16
[74553.334135] c_can_isr: irqstatus=0x16
[74553.334161] c_can_isr: irqstatus=0x16
[74553.334185] c_can_isr: irqstatus=0x16
[74553.334209] c_can_isr: irqstatus=0x16
[74553.334262] c_can_isr: irqstatus=0x16
[74553.334290] c_can_isr: irqstatus=0x16
[74553.334316] c_can_isr: irqstatus=0x16
[74553.334342] c_can_isr: irqstatus=0x16
[74553.334368] c_can_isr: irqstatus=0x16
[74553.334396] c_can_isr: irqstatus=0x16
[74553.334422] c_can_isr: irqstatus=0x16
[74553.334448] c_can_isr: irqstatus=0x16
[74553.334474] c_can_isr: irqstatus=0x16
[74553.334499] c_can_isr: irqstatus=0x16
[74553.334525] c_can_isr: irqstatus=0x16

- Mike



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-21 16:58                                                 ` Casper Mogensen
@ 2012-11-21 19:48                                                   ` Wolfgang Grandegger
  0 siblings, 0 replies; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-11-21 19:48 UTC (permalink / raw)
  To: Casper Mogensen; +Cc: Michael Pellegrini, linux-can

On 11/21/2012 05:58 PM, Casper Mogensen wrote:
> Hi again
> 
> Just took a quick look in the C_CAN driver  and from what i can see,
> it suffers from the same problem, as i described earlier with the
> pch_can.
> There is no locking what so ever, but the xmit function and the
> can_do_tx uses same place in ram buffer, and uses same procedure as in
> pch_can with setting registers and then writing them at the end of
> both functions.

I just started to have a closer look. Yes, that looks racy. I'm going to
prepare an updated driver.

Wolfgang.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-21 17:43                                                 ` Michael Pellegrini
@ 2012-11-21 19:55                                                   ` Wolfgang Grandegger
  2012-11-21 21:00                                                     ` Michael Pellegrini
  0 siblings, 1 reply; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-11-21 19:55 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can

[-- Attachment #1: Type: text/plain, Size: 2281 bytes --]

On 11/21/2012 06:43 PM, Michael Pellegrini wrote:
> Wolfgang Grandegger <wg <at> grandegger.com> writes:
> 
>> Could you show the sequence? I need to lookup the meaning of the bits.
> 
> I tried to capture the transition from functional to non-functional transmission
> in dmesg, but the data gets logged too quickly.  Here's the best I can do:
> 
> Functional transmission:
> 
> [74734.413300] c_can_isr: irqstatus=0x15
> [74734.441193] c_can_isr: irqstatus=0x8000
> [74734.441265] c_can_isr: irqstatus=0x16
> [74734.451140] c_can_isr: irqstatus=0x8000
> [74734.451196] c_can_isr: irqstatus=0x17
> [74734.461468] c_can_isr: irqstatus=0x8000
> [74734.461523] c_can_isr: irqstatus=0x18
> [74734.470794] c_can_isr: irqstatus=0x8000
> [74734.470846] c_can_isr: irqstatus=0x19
> [74734.488507] c_can_isr: irqstatus=0x8000
> [74734.488567] c_can_isr: irqstatus=0x1a
> [74734.500074] c_can_isr: irqstatus=0x8000
> [74734.500127] c_can_isr: irqstatus=0x1b
> [74734.515078] c_can_isr: irqstatus=0x8000
> [74734.515133] c_can_isr: irqstatus=0x1c
> [74734.543469] c_can_isr: irqstatus=0x8000
> [74734.543509] c_can_isr: irqstatus=0x1d
> [74734.553434] c_can_isr: irqstatus=0x8000
> [74734.553484] c_can_isr: irqstatus=0x1e
> 
> Non-functional transmission:
> 
> [74553.334110] c_can_isr: irqstatus=0x16
> [74553.334135] c_can_isr: irqstatus=0x16
> [74553.334161] c_can_isr: irqstatus=0x16
> [74553.334185] c_can_isr: irqstatus=0x16
> [74553.334209] c_can_isr: irqstatus=0x16
> [74553.334262] c_can_isr: irqstatus=0x16
> [74553.334290] c_can_isr: irqstatus=0x16
> [74553.334316] c_can_isr: irqstatus=0x16
> [74553.334342] c_can_isr: irqstatus=0x16
> [74553.334368] c_can_isr: irqstatus=0x16
> [74553.334396] c_can_isr: irqstatus=0x16
> [74553.334422] c_can_isr: irqstatus=0x16
> [74553.334448] c_can_isr: irqstatus=0x16
> [74553.334474] c_can_isr: irqstatus=0x16
> [74553.334499] c_can_isr: irqstatus=0x16
> [74553.334525] c_can_isr: irqstatus=0x16

Hm, object 0x16 does not get handled for some reason and then the
interrupt re-occurs. Could you please try "c-can-pci-v5.tar.bz2". As
suggested by Casper. It uses a spinlock in c_can_start_xmit() and
c_can_do_tx() mainly to protect priv->tx_next. If that doesn't help, we
need a more sophisticated instrumentation and trigger.

Wolfgang.




[-- Attachment #2: c-can-pci-v5.tar.bz2 --]
[-- Type: application/x-bzip, Size: 12243 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-21 19:55                                                   ` Wolfgang Grandegger
@ 2012-11-21 21:00                                                     ` Michael Pellegrini
  2012-11-23 14:27                                                       ` Michael Pellegrini
  0 siblings, 1 reply; 79+ messages in thread
From: Michael Pellegrini @ 2012-11-21 21:00 UTC (permalink / raw)
  To: linux-can

Wolfgang Grandegger <wg <at> grandegger.com> writes:

> Hm, object 0x16 does not get handled for some reason and then the
> interrupt re-occurs. Could you please try "c-can-pci-v5.tar.bz2". As
> suggested by Casper. It uses a spinlock in c_can_start_xmit() and
> c_can_do_tx() mainly to protect priv->tx_next. If that doesn't help, we
> need a more sophisticated instrumentation and trigger.

My application has been running strong for about 45 minutes and counting with
this driver.  I will leave the system running over Thanksgiving as a long-term
test.

- Mike





^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-21 21:00                                                     ` Michael Pellegrini
@ 2012-11-23 14:27                                                       ` Michael Pellegrini
  2012-11-23 14:45                                                         ` Wolfgang Grandegger
  0 siblings, 1 reply; 79+ messages in thread
From: Michael Pellegrini @ 2012-11-23 14:27 UTC (permalink / raw)
  To: linux-can

Michael Pellegrini <mikep86 <at> gmail.com> writes:

> My application has been running strong for about 45 minutes and counting with
> this driver.  I will leave the system running over Thanksgiving as a long-term
> test.

The driver has unfortunately failed the long-term test.  When I checked the
PCH-System this morning, it had hit the transmission problem again.  Dmesg 
output is:

[234700.232657] c_can_isr: irqstatus=0x6
[234700.232712] c_can_isr: irqstatus=0x6
[234700.232765] c_can_isr: irqstatus=0x6
[234700.232818] c_can_isr: irqstatus=0x6
[234700.232873] c_can_isr: irqstatus=0x6
[234700.232928] c_can_isr: irqstatus=0x6
[234700.232985] c_can_isr: irqstatus=0x6
[234700.233041] c_can_isr: irqstatus=0x6
[234700.233096] c_can_isr: irqstatus=0x6
[234700.233151] c_can_isr: irqstatus=0x6
[234700.233203] c_can_isr: irqstatus=0x6
[234700.233257] c_can_isr: irqstatus=0x6
[234700.233312] c_can_isr: irqstatus=0x6
[234700.233369] c_can_isr: irqstatus=0x6
[234700.233424] c_can_isr: irqstatus=0x6
[234700.233478] c_can_isr: irqstatus=0x6

"ip -d -s link show can0" output is:

8: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
    link/can
    can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
    bitrate 250000 sample-point 0.875
    tq 500 prop-seg 3 phase-seg1 3 phase-seg2 1 sjw 1
    c_can: tseg1 2..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
    clock 50000000
    re-started bus-errors arbit-lost error-warn error-pass bus-off
    0          0          0          0          0          0
    RX: bytes  packets  errors  dropped overrun mcast
    102603     43967    0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    4487315    1082899  0       0       0       0

I tried sending a message with "cansend can0 123#abcdef" and got the error 
message "write: No buffer space available".

Additionally, data reception is broken.  I can confirm via the CAN Monitor
system that the External Node system is sending messages which the PCH-System
should be receiving.  However, the RX count is not increasing and 
"candump any,0:0,#FFFFFFFF" does not show any messages being transmitted or
received on the interface.

Resetting the interface fixes data transmission and reception.  After the reset,
"ip -d -s link show can0" output is:

8: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
    link/can
    can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
    bitrate 250000 sample-point 0.875
    tq 500 prop-seg 3 phase-seg1 3 phase-seg2 1 sjw 1
    c_can: tseg1 2..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
    clock 50000000
    re-started bus-errors arbit-lost error-warn error-pass bus-off
    0          0          0          0          0          0
    RX: bytes  packets  errors  dropped overrun mcast
    103148     44200    0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    4487318    1082900  0       13      0       0

After the reset, dmesg output is:

[235689.911732] c_can_isr: irqstatus=0x7
[235690.358052] c_can_isr: irqstatus=0x8000
[235690.358111] c_can_isr: irqstatus=0x8
[235690.358282] c_can_isr: irqstatus=0x9
[235690.975410] c_can_isr: irqstatus=0x8000
[235690.975466] c_can_isr: irqstatus=0x1
[235691.420692] c_can_isr: irqstatus=0x8000
[235691.420749] c_can_isr: irqstatus=0x2
[235691.420899] c_can_isr: irqstatus=0x8000
[235691.420937] c_can_isr: irqstatus=0x3
[235692.038095] c_can_isr: irqstatus=0x8000
[235692.038151] c_can_isr: irqstatus=0x4
[235692.483325] c_can_isr: irqstatus=0x8000
[235692.483381] c_can_isr: irqstatus=0x5
[235692.483548] c_can_isr: irqstatus=0x8000
[235692.483587] c_can_isr: irqstatus=0x6
[235693.100729] c_can_isr: irqstatus=0x8000
[235693.100785] c_can_isr: irqstatus=0x7
[235693.545970] c_can_isr: irqstatus=0x8000
[235693.546027] c_can_isr: irqstatus=0x8
[235693.546195] c_can_isr: irqstatus=0x9

- Mike


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-23 14:27                                                       ` Michael Pellegrini
@ 2012-11-23 14:45                                                         ` Wolfgang Grandegger
  2012-11-23 14:47                                                           ` Wolfgang Grandegger
  2012-11-23 15:04                                                           ` Michael Pellegrini
  0 siblings, 2 replies; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-11-23 14:45 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can

On 11/23/2012 03:27 PM, Michael Pellegrini wrote:
> Michael Pellegrini <mikep86 <at> gmail.com> writes:
> 
>> My application has been running strong for about 45 minutes and counting with
>> this driver.  I will leave the system running over Thanksgiving as a long-term
>> test.
> 
> The driver has unfortunately failed the long-term test.  When I checked the
> PCH-System this morning, it had hit the transmission problem again.  Dmesg 
> output is:
> 
> [234700.232657] c_can_isr: irqstatus=0x6
> [234700.232712] c_can_isr: irqstatus=0x6
> [234700.232765] c_can_isr: irqstatus=0x6
> [234700.232818] c_can_isr: irqstatus=0x6
> [234700.232873] c_can_isr: irqstatus=0x6
> [234700.232928] c_can_isr: irqstatus=0x6
> [234700.232985] c_can_isr: irqstatus=0x6
> [234700.233041] c_can_isr: irqstatus=0x6
> [234700.233096] c_can_isr: irqstatus=0x6
> [234700.233151] c_can_isr: irqstatus=0x6
> [234700.233203] c_can_isr: irqstatus=0x6
> [234700.233257] c_can_isr: irqstatus=0x6
> [234700.233312] c_can_isr: irqstatus=0x6
> [234700.233369] c_can_isr: irqstatus=0x6
> [234700.233424] c_can_isr: irqstatus=0x6
> [234700.233478] c_can_isr: irqstatus=0x6

Did you see any other related kernel messages? For real testing you
should remove the debug message above. I will try to add a more
sophisticated trigger.

> "ip -d -s link show can0" output is:
> 
> 8: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
>     link/can
>     can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
>     bitrate 250000 sample-point 0.875
>     tq 500 prop-seg 3 phase-seg1 3 phase-seg2 1 sjw 1
>     c_can: tseg1 2..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
>     clock 50000000
>     re-started bus-errors arbit-lost error-warn error-pass bus-off
>     0          0          0          0          0          0
>     RX: bytes  packets  errors  dropped overrun mcast
>     102603     43967    0       0       0       0
>     TX: bytes  packets  errors  dropped carrier collsns
>     4487315    1082899  0       0       0       0
> 
> I tried sending a message with "cansend can0 123#abcdef" and got the error 
> message "write: No buffer space available".

Yes, that's the old problem.

> Additionally, data reception is broken.  I can confirm via the CAN Monitor
> system that the External Node system is sending messages which the PCH-System
> should be receiving.  However, the RX count is not increasing and 
> "candump any,0:0,#FFFFFFFF" does not show any messages being transmitted or
> received on the interface.

That's likely because the interrupt from the repeated message is not
handled. We seem to have another race. Maybe device access needs to be
protected as well.

Hope to find more time to look into this problem over the weekend.

Wolfgang.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-23 14:45                                                         ` Wolfgang Grandegger
@ 2012-11-23 14:47                                                           ` Wolfgang Grandegger
  2012-11-23 15:14                                                             ` Michael Pellegrini
  2012-11-23 15:04                                                           ` Michael Pellegrini
  1 sibling, 1 reply; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-11-23 14:47 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can

On 11/23/2012 03:45 PM, Wolfgang Grandegger wrote:
> On 11/23/2012 03:27 PM, Michael Pellegrini wrote:
>> Michael Pellegrini <mikep86 <at> gmail.com> writes:
>>
>>> My application has been running strong for about 45 minutes and counting with
>>> this driver.  I will leave the system running over Thanksgiving as a long-term
>>> test.
>>
>> The driver has unfortunately failed the long-term test.  When I checked the
>> PCH-System this morning, it had hit the transmission problem again.  Dmesg 
>> output is:
>>
>> [234700.232657] c_can_isr: irqstatus=0x6
>> [234700.232712] c_can_isr: irqstatus=0x6
>> [234700.232765] c_can_isr: irqstatus=0x6
>> [234700.232818] c_can_isr: irqstatus=0x6
>> [234700.232873] c_can_isr: irqstatus=0x6
>> [234700.232928] c_can_isr: irqstatus=0x6
>> [234700.232985] c_can_isr: irqstatus=0x6
>> [234700.233041] c_can_isr: irqstatus=0x6
>> [234700.233096] c_can_isr: irqstatus=0x6
>> [234700.233151] c_can_isr: irqstatus=0x6
>> [234700.233203] c_can_isr: irqstatus=0x6
>> [234700.233257] c_can_isr: irqstatus=0x6
>> [234700.233312] c_can_isr: irqstatus=0x6
>> [234700.233369] c_can_isr: irqstatus=0x6
>> [234700.233424] c_can_isr: irqstatus=0x6
>> [234700.233478] c_can_isr: irqstatus=0x6
> 
> Did you see any other related kernel messages? For real testing you
> should remove the debug message above. I will try to add a more
> sophisticated trigger.
> 
>> "ip -d -s link show can0" output is:
>>
>> 8: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
>>     link/can
>>     can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
>>     bitrate 250000 sample-point 0.875
>>     tq 500 prop-seg 3 phase-seg1 3 phase-seg2 1 sjw 1
>>     c_can: tseg1 2..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
>>     clock 50000000
>>     re-started bus-errors arbit-lost error-warn error-pass bus-off
>>     0          0          0          0          0          0
>>     RX: bytes  packets  errors  dropped overrun mcast
>>     102603     43967    0       0       0       0
>>     TX: bytes  packets  errors  dropped carrier collsns
>>     4487315    1082899  0       0       0       0
>>
>> I tried sending a message with "cansend can0 123#abcdef" and got the error 
>> message "write: No buffer space available".
> 
> Yes, that's the old problem.
> 
>> Additionally, data reception is broken.  I can confirm via the CAN Monitor
>> system that the External Node system is sending messages which the PCH-System
>> should be receiving.  However, the RX count is not increasing and 
>> "candump any,0:0,#FFFFFFFF" does not show any messages being transmitted or
>> received on the interface.
> 
> That's likely because the interrupt from the repeated message is not
> handled. We seem to have another race. Maybe device access needs to be
> protected as well.
> 
> Hope to find more time to look into this problem over the weekend.

When the module is loaded the driver prints out some values. Could you
please show the output?

Another question? At what rate do you send messages?

Thanks,

Wolfgang.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-23 14:45                                                         ` Wolfgang Grandegger
  2012-11-23 14:47                                                           ` Wolfgang Grandegger
@ 2012-11-23 15:04                                                           ` Michael Pellegrini
  2012-11-23 17:00                                                             ` Wolfgang Grandegger
  1 sibling, 1 reply; 79+ messages in thread
From: Michael Pellegrini @ 2012-11-23 15:04 UTC (permalink / raw)
  To: linux-can

Wolfgang Grandegger <wg <at> grandegger.com> writes:

> Did you see any other related kernel messages? For real testing you
> should remove the debug message above. I will try to add a more
> sophisticated trigger.

I didn't see any other messages.  That message prints in such a tight
loop that it's hard to catch any other messages.

- Mike





^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-23 14:47                                                           ` Wolfgang Grandegger
@ 2012-11-23 15:14                                                             ` Michael Pellegrini
  0 siblings, 0 replies; 79+ messages in thread
From: Michael Pellegrini @ 2012-11-23 15:14 UTC (permalink / raw)
  To: linux-can

Wolfgang Grandegger <wg <at> grandegger.com> writes:

> When the module is loaded the driver prints out some values. Could you
> please show the output?

On module load, dmesg reports:

[238233.481246] c_can_pci 0000:02:0c.3: PCI INT C -> GSI 18 (level, low) ->
 IRQ 18
[238233.481440] c_can_pci 0000:02:0c.3: irq 41 for MSI/MSI-X
[238233.481483] c_can_pci 0000:02:0c.3: MSI enabled
[238233.481519] c_can_pci 0000:02:0c.3: setting latency timer to 64
[238233.481543] c_can_pci_probe: bar=1
[238233.481624] c_can_pci_probe: addr=f842c000
[238233.492906] CTRL_REG=0x0
[238233.492931] BTR_REG =0x518
[238233.492949] TEST_REG =0x80
[238233.610409] c_can_pci_reset_pch: addr=f842c1fc reset=0x0
[238233.610589] c_can_pci 0000:02:0c.3: can0: setting BTR=0518 BRPE=0000

> Another question? At what rate do you send messages?

6 to 7 messages are sent in bursts every 100ms.  They are sent without any delay
between each message, which is what I think is triggering this driver bug.

- Mike 





^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-23 15:04                                                           ` Michael Pellegrini
@ 2012-11-23 17:00                                                             ` Wolfgang Grandegger
  2012-11-23 17:18                                                               ` Wolfgang Grandegger
  2012-11-24  7:16                                                               ` Wolfgang Grandegger
  0 siblings, 2 replies; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-11-23 17:00 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can

On 11/23/2012 04:04 PM, Michael Pellegrini wrote:
> Wolfgang Grandegger <wg <at> grandegger.com> writes:
> 
>> Did you see any other related kernel messages? For real testing you
>> should remove the debug message above. I will try to add a more
>> sophisticated trigger.
> 
> I didn't see any other messages.  That message prints in such a tight
> loop that it's hard to catch any other messages.

That's clear. Therefore please remove that printk for testing. For a
quick test could you please add spin_locks to c_can_do_rx_poll() similar
to c_can_do_tx(). My suspicion is that there is a race in accessing the
message ram. There is this infamous c_can_msg_obj_is_busy() in
c_can_object_get() and c_can_object_put().

Wolfgang.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-23 17:00                                                             ` Wolfgang Grandegger
@ 2012-11-23 17:18                                                               ` Wolfgang Grandegger
  2012-11-23 17:52                                                                 ` Michael Pellegrini
  2012-11-24  7:16                                                               ` Wolfgang Grandegger
  1 sibling, 1 reply; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-11-23 17:18 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can

On 11/23/2012 06:00 PM, Wolfgang Grandegger wrote:
> On 11/23/2012 04:04 PM, Michael Pellegrini wrote:
>> Wolfgang Grandegger <wg <at> grandegger.com> writes:
>>
>>> Did you see any other related kernel messages? For real testing you
>>> should remove the debug message above. I will try to add a more
>>> sophisticated trigger.
>>
>> I didn't see any other messages.  That message prints in such a tight
>> loop that it's hard to catch any other messages.
> 
> That's clear. Therefore please remove that printk for testing. For a
> quick test could you please add spin_locks to c_can_do_rx_poll() similar
> to c_can_do_tx(). My suspicion is that there is a race in accessing the
> message ram. There is this infamous c_can_msg_obj_is_busy() in
> c_can_object_get() and c_can_object_put().

And please use:

CONFIG_CAN_DEBUG_DEVICES=y CONFIG_CAN_C_CAN_PCI=m make -C
/usr/src/linux-headers-`uname -r` SUBDIRS=`pwd` modules

otherwise we will not have debug messages.

Wolfgang.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-23 17:18                                                               ` Wolfgang Grandegger
@ 2012-11-23 17:52                                                                 ` Michael Pellegrini
  2012-11-25 16:17                                                                   ` Wolfgang Grandegger
  0 siblings, 1 reply; 79+ messages in thread
From: Michael Pellegrini @ 2012-11-23 17:52 UTC (permalink / raw)
  To: linux-can

Wolfgang Grandegger <wg <at> grandegger.com> writes:

> > That's clear. Therefore please remove that printk for testing. For a
> > quick test could you please add spin_locks to c_can_do_rx_poll() similar
> > to c_can_do_tx(). My suspicion is that there is a race in accessing the
> > message ram. There is this infamous c_can_msg_obj_is_busy() in
> > c_can_object_get() and c_can_object_put().
> 
> And please use:
> 
> CONFIG_CAN_DEBUG_DEVICES=y CONFIG_CAN_C_CAN_PCI=m make -C
> /usr/src/linux-headers-`uname -r` SUBDIRS=`pwd` modules
> 
> otherwise we will not have debug messages.

Ok, I've applied the following patch:

*** ../c-can-pci-v5/c_can.c	2012-11-21 14:47:03.000000000 -0500
--- ./c_can.c	2012-11-23 12:27:05.946016919 -0500
*************** static int c_can_do_rx_poll(struct net_d
*** 795,800 ****
--- 795,803 ----
  	unsigned int msg_obj, msg_ctrl_save;
  	struct c_can_priv *priv = netdev_priv(dev);
  	u32 val = c_can_read_reg32(priv, C_CAN_INTPND1_REG);
+ 	unsigned long flags;
+ 
+ 	spin_lock_irqsave(&priv->lock, flags);
  
  	for (msg_obj = C_CAN_MSG_OBJ_RX_FIRST;
  			msg_obj <= C_CAN_MSG_OBJ_RX_LAST && quota > 0;
*************** static int c_can_do_rx_poll(struct net_d
*** 843,848 ****
--- 846,853 ----
  		}
  	}
  
+ 	spin_unlock_irqrestore(&priv->lock, flags);
+ 
  	return num_rx_pkts;
  }
  
*************** static irqreturn_t c_can_isr(int irq, vo
*** 1084,1090 ****
  	struct c_can_priv *priv = netdev_priv(dev);
  
  	priv->irqstatus = priv->read_reg(priv, C_CAN_INT_REG);
! 	printk("%s: irqstatus=%#x\n", __func__, priv->irqstatus);
  	if (!priv->irqstatus)
  		return IRQ_NONE;
  
--- 1089,1095 ----
  	struct c_can_priv *priv = netdev_priv(dev);
  
  	priv->irqstatus = priv->read_reg(priv, C_CAN_INT_REG);
! 	//printk("%s: irqstatus=%#x\n", __func__, priv->irqstatus);
  	if (!priv->irqstatus)
  		return IRQ_NONE;

I also built the modules with debug messages enabled, per your instructions.
I'm seeing more output in dmesg now, so I think I was building with debug
messages disabled previously.  I apologize for that.

I will run a long-term test with this driver over the weekend and see how it
holds up.

Thanks,
Mike



^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-23 17:00                                                             ` Wolfgang Grandegger
  2012-11-23 17:18                                                               ` Wolfgang Grandegger
@ 2012-11-24  7:16                                                               ` Wolfgang Grandegger
  2012-11-26  3:33                                                                 ` Bhupesh SHARMA
  1 sibling, 1 reply; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-11-24  7:16 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can, bhupesh.sharma

On 11/23/2012 06:00 PM, Wolfgang Grandegger wrote:
> On 11/23/2012 04:04 PM, Michael Pellegrini wrote:
>> Wolfgang Grandegger <wg <at> grandegger.com> writes:
>>
>>> Did you see any other related kernel messages? For real testing you
>>> should remove the debug message above. I will try to add a more
>>> sophisticated trigger.
>>
>> I didn't see any other messages.  That message prints in such a tight
>> loop that it's hard to catch any other messages.
> 
> That's clear. Therefore please remove that printk for testing. For a
> quick test could you please add spin_locks to c_can_do_rx_poll() similar
> to c_can_do_tx(). My suspicion is that there is a race in accessing the
> message ram. There is this infamous c_can_msg_obj_is_busy() in
> c_can_object_get() and c_can_object_put().

At a closer look I realized that the c_can driver uses IF1 for both, RX
and TX. The manual recommends to use one IF for RX and the other for TX
to avoid conflicts between the CPU access to the message RAM. I see this
implemented in the pch_can driver but not the c_can driver. Bhupesh, any
comments? Maybe I have missed something.

Wolfgang.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-23 17:52                                                                 ` Michael Pellegrini
@ 2012-11-25 16:17                                                                   ` Wolfgang Grandegger
  2012-11-26 14:54                                                                     ` Michael Pellegrini
  0 siblings, 1 reply; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-11-25 16:17 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can

[-- Attachment #1: Type: text/plain, Size: 1890 bytes --]

Hi Michael,

On 11/23/2012 06:52 PM, Michael Pellegrini wrote:
> Wolfgang Grandegger <wg <at> grandegger.com> writes:
> 
>>> That's clear. Therefore please remove that printk for testing. For a
>>> quick test could you please add spin_locks to c_can_do_rx_poll() similar
>>> to c_can_do_tx(). My suspicion is that there is a race in accessing the
>>> message ram. There is this infamous c_can_msg_obj_is_busy() in
>>> c_can_object_get() and c_can_object_put().
>>
>> And please use:
>>
>> CONFIG_CAN_DEBUG_DEVICES=y CONFIG_CAN_C_CAN_PCI=m make -C
>> /usr/src/linux-headers-`uname -r` SUBDIRS=`pwd` modules
>>
>> otherwise we will not have debug messages.
> 
> Ok, I've applied the following patch:
> 
> *** ../c-can-pci-v5/c_can.c	2012-11-21 14:47:03.000000000 -0500
> --- ./c_can.c	2012-11-23 12:27:05.946016919 -0500
> *************** static int c_can_do_rx_poll(struct net_d
> *** 795,800 ****
> --- 795,803 ----
>   	unsigned int msg_obj, msg_ctrl_save;
>   	struct c_can_priv *priv = netdev_priv(dev);
>   	u32 val = c_can_read_reg32(priv, C_CAN_INTPND1_REG);
> + 	unsigned long flags;
> + 
> + 	spin_lock_irqsave(&priv->lock, flags);
>   
>   	for (msg_obj = C_CAN_MSG_OBJ_RX_FIRST;
>   			msg_obj <= C_CAN_MSG_OBJ_RX_LAST && quota > 0;
> *************** static int c_can_do_rx_poll(struct net_d
> *** 843,848 ****
> --- 846,853 ----
>   		}
>   	}
>   
> + 	spin_unlock_irqrestore(&priv->lock, flags);
> + 
>   	return num_rx_pkts;
>   }

There is a return in the critical section which must also be handled.
Hope you didn't hit it...

I have attached v7 fixing this issue. Furthermore I have added spinlock
protection to the PCH driver. It needs fixing, even if I want to get
ride of it as soon as possible. Could you please give this driver a try
as well? The README tells how to build the modules. I will also send my
current patch stack for the record (and feedback).

Wolfgang.

[-- Attachment #2: c-can-pci-v7.tar.bz2 --]
[-- Type: application/x-bzip, Size: 16825 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* RE: pch_can: Data transmission stops after dropped packet
  2012-11-24  7:16                                                               ` Wolfgang Grandegger
@ 2012-11-26  3:33                                                                 ` Bhupesh SHARMA
  0 siblings, 0 replies; 79+ messages in thread
From: Bhupesh SHARMA @ 2012-11-26  3:33 UTC (permalink / raw)
  To: Wolfgang Grandegger, Michael Pellegrini; +Cc: linux-can

Hi Wolfgang,

> -----Original Message-----
> From: Wolfgang Grandegger [mailto:wg@grandegger.com]
> Sent: Saturday, November 24, 2012 12:46 PM
> To: Michael Pellegrini
> Cc: linux-can@vger.kernel.org; Bhupesh SHARMA
> Subject: Re: pch_can: Data transmission stops after dropped packet
> 
> On 11/23/2012 06:00 PM, Wolfgang Grandegger wrote:
> > On 11/23/2012 04:04 PM, Michael Pellegrini wrote:
> >> Wolfgang Grandegger <wg <at> grandegger.com> writes:
> >>
> >>> Did you see any other related kernel messages? For real testing you
> >>> should remove the debug message above. I will try to add a more
> >>> sophisticated trigger.
> >>
> >> I didn't see any other messages.  That message prints in such a tight
> >> loop that it's hard to catch any other messages.
> >
> > That's clear. Therefore please remove that printk for testing. For a
> > quick test could you please add spin_locks to c_can_do_rx_poll()
> > similar to c_can_do_tx(). My suspicion is that there is a race in
> > accessing the message ram. There is this infamous
> > c_can_msg_obj_is_busy() in
> > c_can_object_get() and c_can_object_put().
> 
> At a closer look I realized that the c_can driver uses IF1 for both, RX and TX.
> The manual recommends to use one IF for RX and the other for TX to avoid
> conflicts between the CPU access to the message RAM. I see this
> implemented in the pch_can driver but not the c_can driver. Bhupesh, any
> comments? Maybe I have missed something.
> 

Yes, at the moment the C_CAN driver code uses only IF1 for both Tx and Rx purposes.
It was on my To-Do list but I got moved to some other project, so wasn't able to work on this topic.

However, the changes are very easy and I will try to work on the same and circulate
some patches by early December. In the meanwhile if someone else has some time and
a C_CAN platform to test the patches, I will be more than happy :)

Regards,
Bhupesh

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-25 16:17                                                                   ` Wolfgang Grandegger
@ 2012-11-26 14:54                                                                     ` Michael Pellegrini
  2012-11-26 15:30                                                                       ` Wolfgang Grandegger
  0 siblings, 1 reply; 79+ messages in thread
From: Michael Pellegrini @ 2012-11-26 14:54 UTC (permalink / raw)
  To: linux-can

Wolfgang Grandegger <wg <at> grandegger.com> writes:

> There is a return in the critical section which must also be handled.
> Hope you didn't hit it...
> 
> I have attached v7 fixing this issue. Furthermore I have added spinlock
> protection to the PCH driver. It needs fixing, even if I want to get
> ride of it as soon as possible. Could you please give this driver a try
> as well? The README tells how to build the modules. I will also send my
> current patch stack for the record (and feedback).

Oops, I missed that return.  Looks like the system didn't hit it though,
the CAN interface was still functional after running continuously over the 
weekend.

I tried the PCH driver and hit the transmission failure within a minute.
I'm happy to test out more changes to this driver if you think it is worth
pursuing.

I started a test with the new c_can driver.  I'll check on it throughout
the day and let it run overnight as well.

Thanks,
Mike





^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-26 14:54                                                                     ` Michael Pellegrini
@ 2012-11-26 15:30                                                                       ` Wolfgang Grandegger
  2012-11-26 17:30                                                                         ` Michael Pellegrini
  0 siblings, 1 reply; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-11-26 15:30 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can

On 11/26/2012 03:54 PM, Michael Pellegrini wrote:
> Wolfgang Grandegger <wg <at> grandegger.com> writes:
> 
>> There is a return in the critical section which must also be handled.
>> Hope you didn't hit it...
>>
>> I have attached v7 fixing this issue. Furthermore I have added spinlock
>> protection to the PCH driver. It needs fixing, even if I want to get
>> ride of it as soon as possible. Could you please give this driver a try
>> as well? The README tells how to build the modules. I will also send my
>> current patch stack for the record (and feedback).
> 
> Oops, I missed that return.  Looks like the system didn't hit it though,
> the CAN interface was still functional after running continuously over the 
> weekend.

Not too bad! The return does only happen at high load. When you
"ifconfig up" the device some kernel messages are printed. Could you
please show them. I want to understand if the reset really occurs by
checking some register values.

> I tried the PCH driver and hit the transmission failure within a minute.

Ah. In the function pch_xmit(), could you please move

	spin_unlock_irqrestore(&priv->lock, flags);

to the end of the function just before

 	return NETDEV_TX_OK;

and then retry. This would fix races with accessing the message ram as
well (via pch_can_rw_msg_obj). I missed that.

> I'm happy to test out more changes to this driver if you think it is worth
> pursuing.

Remote debugging is slow, unfortunately. Thanks for your patience.

> I started a test with the new c_can driver.  I'll check on it throughout
> the day and let it run overnight as well.

OK, apart from the return issue above the driver has not changed from
the functional point of view.

Wolfgang.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-26 15:30                                                                       ` Wolfgang Grandegger
@ 2012-11-26 17:30                                                                         ` Michael Pellegrini
  2012-11-26 18:13                                                                           ` Wolfgang Grandegger
  0 siblings, 1 reply; 79+ messages in thread
From: Michael Pellegrini @ 2012-11-26 17:30 UTC (permalink / raw)
  To: linux-can

Wolfgang Grandegger <wg <at> grandegger.com> writes:

> Not too bad! The return does only happen at high load. When you
> "ifconfig up" the device some kernel messages are printed. Could you
> please show them. I want to understand if the reset really occurs by
> checking some register values.

"ifconfig can0 down" followed by "ifconfig can0 up" produces the following dmesg
output:

[10113.228189] CTRL_REG=0x1
[10113.228213] BTR_REG =0x2301
[10113.228230] TEST_REG =0x80
[10113.228267] c_can_pci 0000:02:0c.3: can0: obj no:1, msgval:0x00000000
[10113.228303] c_can_pci 0000:02:0c.3: can0: obj no:2, msgval:0x00000000
[10113.228330] c_can_pci 0000:02:0c.3: can0: obj no:3, msgval:0x00000000
[10113.228356] c_can_pci 0000:02:0c.3: can0: obj no:4, msgval:0x00000000
[10113.228381] c_can_pci 0000:02:0c.3: can0: obj no:5, msgval:0x00000000
[10113.228407] c_can_pci 0000:02:0c.3: can0: obj no:6, msgval:0x00000000
[10113.228433] c_can_pci 0000:02:0c.3: can0: obj no:7, msgval:0x00000000
[10113.228458] c_can_pci 0000:02:0c.3: can0: obj no:8, msgval:0x00000000
[10113.228484] c_can_pci 0000:02:0c.3: can0: obj no:9, msgval:0x00000000
[10113.228510] c_can_pci 0000:02:0c.3: can0: obj no:10, msgval:0x00000000
[10113.228536] c_can_pci 0000:02:0c.3: can0: obj no:11, msgval:0x00000000
[10113.228562] c_can_pci 0000:02:0c.3: can0: obj no:12, msgval:0x00000000
[10113.228587] c_can_pci 0000:02:0c.3: can0: obj no:13, msgval:0x00000000
[10113.228613] c_can_pci 0000:02:0c.3: can0: obj no:14, msgval:0x00000000
[10113.228639] c_can_pci 0000:02:0c.3: can0: obj no:15, msgval:0x00000000
[10113.228665] c_can_pci 0000:02:0c.3: can0: obj no:16, msgval:0x00000000
[10113.228691] c_can_pci 0000:02:0c.3: can0: obj no:17, msgval:0x00000000
[10113.228716] c_can_pci 0000:02:0c.3: can0: obj no:18, msgval:0x00000000
[10113.228742] c_can_pci 0000:02:0c.3: can0: obj no:19, msgval:0x00000000
[10113.228768] c_can_pci 0000:02:0c.3: can0: obj no:20, msgval:0x00000000
[10113.228794] c_can_pci 0000:02:0c.3: can0: obj no:21, msgval:0x00000000
[10113.228820] c_can_pci 0000:02:0c.3: can0: obj no:22, msgval:0x00000000
[10113.228845] c_can_pci 0000:02:0c.3: can0: obj no:23, msgval:0x00000000
[10113.228871] c_can_pci 0000:02:0c.3: can0: obj no:24, msgval:0x00000000
[10113.228897] c_can_pci 0000:02:0c.3: can0: obj no:25, msgval:0x00000000
[10113.228923] c_can_pci 0000:02:0c.3: can0: obj no:26, msgval:0x00000000
[10113.228949] c_can_pci 0000:02:0c.3: can0: obj no:27, msgval:0x00000000
[10113.228974] c_can_pci 0000:02:0c.3: can0: obj no:28, msgval:0x00000000
[10113.229000] c_can_pci 0000:02:0c.3: can0: obj no:29, msgval:0x00000000
[10113.229026] c_can_pci 0000:02:0c.3: can0: obj no:30, msgval:0x00000000
[10113.229052] c_can_pci 0000:02:0c.3: can0: obj no:31, msgval:0x00000000
[10113.229078] c_can_pci 0000:02:0c.3: can0: obj no:32, msgval:0x00000000
[10113.229105] c_can_pci 0000:02:0c.3: can0: obj no:1, msgval:0x00000001
[10113.229132] c_can_pci 0000:02:0c.3: can0: obj no:2, msgval:0x00000003
[10113.229159] c_can_pci 0000:02:0c.3: can0: obj no:3, msgval:0x00000007
[10113.229185] c_can_pci 0000:02:0c.3: can0: obj no:4, msgval:0x0000000f
[10113.229212] c_can_pci 0000:02:0c.3: can0: obj no:5, msgval:0x0000001f
[10113.229239] c_can_pci 0000:02:0c.3: can0: obj no:6, msgval:0x0000003f
[10113.229266] c_can_pci 0000:02:0c.3: can0: obj no:7, msgval:0x0000007f
[10113.229293] c_can_pci 0000:02:0c.3: can0: obj no:8, msgval:0x000000ff
[10113.229320] c_can_pci 0000:02:0c.3: can0: obj no:9, msgval:0x000001ff
[10113.229347] c_can_pci 0000:02:0c.3: can0: obj no:10, msgval:0x000003ff
[10113.229373] c_can_pci 0000:02:0c.3: can0: obj no:11, msgval:0x000007ff
[10113.229400] c_can_pci 0000:02:0c.3: can0: obj no:12, msgval:0x00000fff
[10113.229427] c_can_pci 0000:02:0c.3: can0: obj no:13, msgval:0x00001fff
[10113.229455] c_can_pci 0000:02:0c.3: can0: obj no:14, msgval:0x00003fff
[10113.229481] c_can_pci 0000:02:0c.3: can0: obj no:15, msgval:0x00007fff
[10113.229508] c_can_pci 0000:02:0c.3: can0: obj no:16, msgval:0x0000ffff
[10113.229527] c_can_pci 0000:02:0c.3: can0: setting BTR=0518 BRPE=0000

Note that I used v7 of the driver to get this data.

> > I tried the PCH driver and hit the transmission failure within a minute.
> 
> Ah. In the function pch_xmit(), could you please move
> 
> 	spin_unlock_irqrestore(&priv->lock, flags);
> 
> to the end of the function just before
> 
>  	return NETDEV_TX_OK;
> 
> and then retry. This would fix races with accessing the message ram as
> well (via pch_can_rw_msg_obj). I missed that.

Alright, I applied the following patch:

*** ../c-can-pci-v7/pch_can.c	2012-11-25 05:09:13.000000000 -0500
--- ./pch_can.c	2012-11-26 11:29:11.350012074 -0500
*************** static netdev_tx_t pch_xmit(struct sk_bu
*** 921,928 ****
  		priv->tx_obj++;
  	}
  
- 	spin_unlock_irqrestore(&priv->lock, flags);
- 
  	/* Setting the CMASK register. */
  	pch_can_bit_set(&priv->regs->ifregs[1].cmask, PCH_CMASK_ALL);
  
--- 921,926 ----
*************** static netdev_tx_t pch_xmit(struct sk_bu
*** 957,962 ****
--- 955,962 ----
  
  	pch_can_rw_msg_obj(&priv->regs->ifregs[1].creq, tx_obj_no);
  
+ 	spin_unlock_irqrestore(&priv->lock, flags);
+ 
  	return NETDEV_TX_OK;
  }

The patched driver did not fail in the first few minutes, so that's a good sign.
I will run this driver overnight.

> > I'm happy to test out more changes to this driver if you think it is worth
> > pursuing.
> 
> Remote debugging is slow, unfortunately. Thanks for your patience.

No problem.  I'm just thankful that the problem is getting addressed.

> > I started a test with the new c_can driver.  I'll check on it throughout
> > the day and let it run overnight as well.
> 
> OK, apart from the return issue above the driver has not changed from
> the functional point of view.

Alright, I will wait until more substantial changes are implemented before
re-running the long-term test on this driver.

Thanks,
Mike




^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-26 17:30                                                                         ` Michael Pellegrini
@ 2012-11-26 18:13                                                                           ` Wolfgang Grandegger
  2012-11-29 12:15                                                                             ` Wolfgang Grandegger
  0 siblings, 1 reply; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-11-26 18:13 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can

On 11/26/2012 06:30 PM, Michael Pellegrini wrote:
> Wolfgang Grandegger <wg <at> grandegger.com> writes:
> 
>> Not too bad! The return does only happen at high load. When you
>> "ifconfig up" the device some kernel messages are printed. Could you
>> please show them. I want to understand if the reset really occurs by
>> checking some register values.
> 
> "ifconfig can0 down" followed by "ifconfig can0 up" produces the following dmesg
> output:
> 
> [10113.228189] CTRL_REG=0x1
> [10113.228213] BTR_REG =0x2301
> [10113.228230] TEST_REG =0x80

These are the correct reset values. Therefore, a reset did occur.

> Note that I used v7 of the driver to get this data.
> 
>>> I tried the PCH driver and hit the transmission failure within a minute.
>>
>> Ah. In the function pch_xmit(), could you please move
>>
>> 	spin_unlock_irqrestore(&priv->lock, flags);
>>
>> to the end of the function just before
>>
>>  	return NETDEV_TX_OK;
>>
>> and then retry. This would fix races with accessing the message ram as
>> well (via pch_can_rw_msg_obj). I missed that.
> 
> Alright, I applied the following patch:
> 
> *** ../c-can-pci-v7/pch_can.c	2012-11-25 05:09:13.000000000 -0500
> --- ./pch_can.c	2012-11-26 11:29:11.350012074 -0500
> *************** static netdev_tx_t pch_xmit(struct sk_bu
> *** 921,928 ****
>   		priv->tx_obj++;
>   	}
>   
> - 	spin_unlock_irqrestore(&priv->lock, flags);
> - 
>   	/* Setting the CMASK register. */
>   	pch_can_bit_set(&priv->regs->ifregs[1].cmask, PCH_CMASK_ALL);
>   
> --- 921,926 ----
> *************** static netdev_tx_t pch_xmit(struct sk_bu
> *** 957,962 ****
> --- 955,962 ----
>   
>   	pch_can_rw_msg_obj(&priv->regs->ifregs[1].creq, tx_obj_no);
>   
> + 	spin_unlock_irqrestore(&priv->lock, flags);
> + 
>   	return NETDEV_TX_OK;
>   }
> 
> The patched driver did not fail in the first few minutes, so that's a good sign.
> I will run this driver overnight.

OK, thank. I think the problem is fixed.

>>> I'm happy to test out more changes to this driver if you think it is worth
>>> pursuing.
>>
>> Remote debugging is slow, unfortunately. Thanks for your patience.
> 
> No problem.  I'm just thankful that the problem is getting addressed.
> 
>>> I started a test with the new c_can driver.  I'll check on it throughout
>>> the day and let it run overnight as well.
>>
>> OK, apart from the return issue above the driver has not changed from
>> the functional point of view.
> 
> Alright, I will wait until more substantial changes are implemented before
> re-running the long-term test on this driver.

Yes, maybe we can come up with an even better solution.

Wolfgang.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-26 18:13                                                                           ` Wolfgang Grandegger
@ 2012-11-29 12:15                                                                             ` Wolfgang Grandegger
  2012-11-29 14:15                                                                               ` Michael Pellegrini
  0 siblings, 1 reply; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-11-29 12:15 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can

[-- Attachment #1: Type: text/plain, Size: 603 bytes --]

Hi Micheal,

On 11/26/2012 07:13 PM, Wolfgang Grandegger wrote:
> On 11/26/2012 06:30 PM, Michael Pellegrini wrote:
...
>> Alright, I will wait until more substantial changes are implemented before
>> re-running the long-term test on this driver.
> 
> Yes, maybe we can come up with an even better solution.

I have implemented a better solution now using different sets of
registers for tx and rx. This avoids locking in the RX path as well.
Furthermore, pch_can now uses spin_[un]lock_bh. Would be nice if you
could give the pch_can and c_can_pci driver a try when time permits.

Thanks,

Wolfgang.



[-- Attachment #2: c-can-pci-v8.tar.bz2 --]
[-- Type: application/x-bzip, Size: 16856 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-29 12:15                                                                             ` Wolfgang Grandegger
@ 2012-11-29 14:15                                                                               ` Michael Pellegrini
  2012-12-06 14:20                                                                                 ` Michael Pellegrini
  0 siblings, 1 reply; 79+ messages in thread
From: Michael Pellegrini @ 2012-11-29 14:15 UTC (permalink / raw)
  To: linux-can

Wolfgang Grandegger <wg <at> grandegger.com> writes:

> I have implemented a better solution now using different sets of
> registers for tx and rx. This avoids locking in the RX path as well.
> Furthermore, pch_can now uses spin_[un]lock_bh. Would be nice if you
> could give the pch_can and c_can_pci driver a try when time permits.

Great! I'm happy to test these drivers out.  In addition to testing
these drivers through my application, I'm looking at creating a soak test to
really push the interface (and the driver) hard.  That test will probably be
ready early next week.  I will get back to you with the results.

Thank you!
Mike





^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-11-29 14:15                                                                               ` Michael Pellegrini
@ 2012-12-06 14:20                                                                                 ` Michael Pellegrini
  2012-12-06 14:23                                                                                   ` Marc Kleine-Budde
  2012-12-06 14:49                                                                                   ` Wolfgang Grandegger
  0 siblings, 2 replies; 79+ messages in thread
From: Michael Pellegrini @ 2012-12-06 14:20 UTC (permalink / raw)
  To: linux-can

Michael Pellegrini <mikep86 <at> gmail.com> writes:

> > I have implemented a better solution now using different sets of
> > registers for tx and rx. This avoids locking in the RX path as well.
> > Furthermore, pch_can now uses spin_[un]lock_bh. Would be nice if you
> > could give the pch_can and c_can_pci driver a try when time permits.
> 
> Great! I'm happy to test these drivers out.  In addition to testing
> these drivers through my application, I'm looking at creating a soak test to
> really push the interface (and the driver) hard.  That test will probably be
> ready early next week.  I will get back to you with the results.

I have some results.  

The good news: The c_can driver appears to work very well.  It has been running
successfully under the soak test for 18 hours and counting.

The bad news: The pch_can driver is exhibiting the transmission problem again.
The soak test causes the driver to fail within minutes.

Details of the soak test:

There are two systems involved in the test: the PCH-System and an External Node.
The External Node transmits data at a high rate, bringing bus utilization to
~35%.
The PCH-System also transmits data, in bursts of 10 messages every 5 ms.
Combined, the two systems utilize ~90% of bus bandwidth.
The PCH-System is constantly checking that it is receiving data from the
External Node at the expected rate and in the expected order.

- Mike




^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-12-06 14:20                                                                                 ` Michael Pellegrini
@ 2012-12-06 14:23                                                                                   ` Marc Kleine-Budde
  2012-12-06 14:41                                                                                     ` Wolfgang Grandegger
  2012-12-06 14:42                                                                                     ` Michael Pellegrini
  2012-12-06 14:49                                                                                   ` Wolfgang Grandegger
  1 sibling, 2 replies; 79+ messages in thread
From: Marc Kleine-Budde @ 2012-12-06 14:23 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can

[-- Attachment #1: Type: text/plain, Size: 1909 bytes --]

On 12/06/2012 03:20 PM, Michael Pellegrini wrote:
> Michael Pellegrini <mikep86 <at> gmail.com> writes:
> 
>>> I have implemented a better solution now using different sets of
>>> registers for tx and rx. This avoids locking in the RX path as well.
>>> Furthermore, pch_can now uses spin_[un]lock_bh. Would be nice if you
>>> could give the pch_can and c_can_pci driver a try when time permits.
>>
>> Great! I'm happy to test these drivers out.  In addition to testing
>> these drivers through my application, I'm looking at creating a soak test to
>> really push the interface (and the driver) hard.  That test will probably be
>> ready early next week.  I will get back to you with the results.
> 
> I have some results.  
> 
> The good news: The c_can driver appears to work very well.  It has been running
> successfully under the soak test for 18 hours and counting.
> 
> The bad news: The pch_can driver is exhibiting the transmission problem again.
> The soak test causes the driver to fail within minutes.

We want to kick the pch_can driver anyways.

> Details of the soak test:
> 
> There are two systems involved in the test: the PCH-System and an External Node.
> The External Node transmits data at a high rate, bringing bus utilization to
> ~35%.
> The PCH-System also transmits data, in bursts of 10 messages every 5 ms.
> Combined, the two systems utilize ~90% of bus bandwidth.
> The PCH-System is constantly checking that it is receiving data from the
> External Node at the expected rate and in the expected order.

Which patches do you have applied to the c_can driver?

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 261 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-12-06 14:23                                                                                   ` Marc Kleine-Budde
@ 2012-12-06 14:41                                                                                     ` Wolfgang Grandegger
  2012-12-06 14:42                                                                                       ` Marc Kleine-Budde
  2012-12-06 14:42                                                                                     ` Michael Pellegrini
  1 sibling, 1 reply; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-12-06 14:41 UTC (permalink / raw)
  To: Marc Kleine-Budde; +Cc: Michael Pellegrini, linux-can

On 12/06/2012 03:23 PM, Marc Kleine-Budde wrote:
> On 12/06/2012 03:20 PM, Michael Pellegrini wrote:
>> Michael Pellegrini <mikep86 <at> gmail.com> writes:
>>
>>>> I have implemented a better solution now using different sets of
>>>> registers for tx and rx. This avoids locking in the RX path as well.
>>>> Furthermore, pch_can now uses spin_[un]lock_bh. Would be nice if you
>>>> could give the pch_can and c_can_pci driver a try when time permits.
>>>
>>> Great! I'm happy to test these drivers out.  In addition to testing
>>> these drivers through my application, I'm looking at creating a soak test to
>>> really push the interface (and the driver) hard.  That test will probably be
>>> ready early next week.  I will get back to you with the results.
>>
>> I have some results.  
>>
>> The good news: The c_can driver appears to work very well.  It has been running
>> successfully under the soak test for 18 hours and counting.
>>
>> The bad news: The pch_can driver is exhibiting the transmission problem again.
>> The soak test causes the driver to fail within minutes.
> 
> We want to kick the pch_can driver anyways.
> 
>> Details of the soak test:
>>
>> There are two systems involved in the test: the PCH-System and an External Node.
>> The External Node transmits data at a high rate, bringing bus utilization to
>> ~35%.
>> The PCH-System also transmits data, in bursts of 10 messages every 5 ms.
>> Combined, the two systems utilize ~90% of bus bandwidth.
>> The PCH-System is constantly checking that it is receiving data from the
>> External Node at the expected rate and in the expected order.
> 
> Which patches do you have applied to the c_can driver?

I provided out-of-tree sources of my "RFC v2" patches allowing him to
easily build it for his UBUNTU v3.2 system.

Wolfgang.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-12-06 14:41                                                                                     ` Wolfgang Grandegger
@ 2012-12-06 14:42                                                                                       ` Marc Kleine-Budde
  0 siblings, 0 replies; 79+ messages in thread
From: Marc Kleine-Budde @ 2012-12-06 14:42 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: Michael Pellegrini, linux-can

[-- Attachment #1: Type: text/plain, Size: 529 bytes --]

On 12/06/2012 03:41 PM, Wolfgang Grandegger wrote:
>> Which patches do you have applied to the c_can driver?
> 
> I provided out-of-tree sources of my "RFC v2" patches allowing him to
> easily build it for his UBUNTU v3.2 system.

tnx

Marc


-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 261 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-12-06 14:23                                                                                   ` Marc Kleine-Budde
  2012-12-06 14:41                                                                                     ` Wolfgang Grandegger
@ 2012-12-06 14:42                                                                                     ` Michael Pellegrini
  1 sibling, 0 replies; 79+ messages in thread
From: Michael Pellegrini @ 2012-12-06 14:42 UTC (permalink / raw)
  To: linux-can

Marc Kleine-Budde <mkl <at> pengutronix.de> writes:

> Which patches do you have applied to the c_can driver?

I am using v8 of the c_can driver posted by Wolfgang in this thread.

- Mike


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-12-06 14:20                                                                                 ` Michael Pellegrini
  2012-12-06 14:23                                                                                   ` Marc Kleine-Budde
@ 2012-12-06 14:49                                                                                   ` Wolfgang Grandegger
  2012-12-06 17:05                                                                                     ` Alexander Stein
  2012-12-06 22:11                                                                                     ` Michael Pellegrini
  1 sibling, 2 replies; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-12-06 14:49 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can

Hi Michael,

On 12/06/2012 03:20 PM, Michael Pellegrini wrote:
> Michael Pellegrini <mikep86 <at> gmail.com> writes:
> 
>>> I have implemented a better solution now using different sets of
>>> registers for tx and rx. This avoids locking in the RX path as well.
>>> Furthermore, pch_can now uses spin_[un]lock_bh. Would be nice if you
>>> could give the pch_can and c_can_pci driver a try when time permits.
>>
>> Great! I'm happy to test these drivers out.  In addition to testing
>> these drivers through my application, I'm looking at creating a soak test to
>> really push the interface (and the driver) hard.  That test will probably be
>> ready early next week.  I will get back to you with the results.
> 
> I have some results.  
> 
> The good news: The c_can driver appears to work very well.  It has been running
> successfully under the soak test for 18 hours and counting.

Sounds good.

> The bad news: The pch_can driver is exhibiting the transmission problem again.
> The soak test causes the driver to fail within minutes.

OK, obviously spin_[un]lock_bh() is not suitable to avoid the race. I
will switch back to spin_lock_irqsave/spin_unlock_irqrestore in the next
version of the patch.

> Details of the soak test:
> 
> There are two systems involved in the test: the PCH-System and an External Node.
> The External Node transmits data at a high rate, bringing bus utilization to
> ~35%.
> The PCH-System also transmits data, in bursts of 10 messages every 5 ms.
> Combined, the two systems utilize ~90% of bus bandwidth.
> The PCH-System is constantly checking that it is receiving data from the
> External Node at the expected rate and in the expected order.

On another thread Alexander is reporting problems with the same driver
when he runs a I2C application concurrently. Are you able to stress the
system in a similar way?

Thanks a lot for testing.

Wolfgang.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-12-06 14:49                                                                                   ` Wolfgang Grandegger
@ 2012-12-06 17:05                                                                                     ` Alexander Stein
  2012-12-06 22:02                                                                                       ` Wolfgang Grandegger
  2012-12-06 22:11                                                                                     ` Michael Pellegrini
  1 sibling, 1 reply; 79+ messages in thread
From: Alexander Stein @ 2012-12-06 17:05 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can, Wolfgang Grandegger

Hi Michael,

On Thursday 06 December 2012 15:49:03, Wolfgang Grandegger wrote:
> > Details of the soak test:
> > 
> > There are two systems involved in the test: the PCH-System and an External Node.
> > The External Node transmits data at a high rate, bringing bus utilization to
> > ~35%.
> > The PCH-System also transmits data, in bursts of 10 messages every 5 ms.
> > Combined, the two systems utilize ~90% of bus bandwidth.
> > The PCH-System is constantly checking that it is receiving data from the
> > External Node at the expected rate and in the expected order.

So you do a lot of transmit and reception of CAN frames?

> On another thread Alexander is reporting problems with the same driver
> when he runs a I2C application concurrently. Are you able to stress the
> system in a similar way?

Could you please test with the following patch? Do you see error messages from this patch?
Thanks!

Alexander

diff --git a/drivers/net/can/c_can/c_can.c b/drivers/net/can/c_can/c_can.c
index d63aaa3..da9bbc0 100644
--- a/drivers/net/can/c_can/c_can.c
+++ b/drivers/net/can/c_can/c_can.c
@@ -1186,6 +1186,7 @@ struct net_device *alloc_c_can_dev(void)
                                        CAN_CTRLMODE_BERR_REPORTING;
 
        spin_lock_init(&priv->lock);
+       spin_lock_init(&priv->testlock);
 
        return dev;
 }
diff --git a/drivers/net/can/c_can/c_can.h b/drivers/net/can/c_can/c_can.h
index 3487d5e..2b58b75 100644
--- a/drivers/net/can/c_can/c_can.h
+++ b/drivers/net/can/c_can/c_can.h
@@ -173,6 +173,7 @@ struct c_can_priv {
        unsigned int instance;
        void (*init) (const struct c_can_priv *priv, bool enable);
        spinlock_t lock;        /* to protect tx and rx message objects */
+       spinlock_t testlock;    /* to protect tx and rx message objects */
 };
 
 struct net_device *alloc_c_can_dev(void);
diff --git a/drivers/net/can/c_can/c_can_pci.c b/drivers/net/can/c_can/c_can_pci.c
index 2516ea9..0ac4d43 100644
--- a/drivers/net/can/c_can/c_can_pci.c
+++ b/drivers/net/can/c_can/c_can_pci.c
@@ -74,13 +74,37 @@ static void c_can_pci_write_reg_aligned_to_32bit(struct c_can_priv *priv,
 static u16 c_can_pci_read_reg_32bit(struct c_can_priv *priv,
                                    enum reg index)
 {
-       return (u16)ioread32(priv->base + 2 * priv->regs[index]);
+       unsigned long flags;
+       u16 reg;
+
+       spin_lock_irqsave(&priv->testlock, flags);
+       reg = (u16)ioread32(priv->base + 2 * priv->regs[index]);
+       spin_unlock_irqrestore(&priv->testlock, flags);
+
+       return reg;
 }
 
 static void c_can_pci_write_reg_32bit(struct c_can_priv *priv,
                                      enum reg index, u16 val)
 {
-       iowrite32((u32)val, priv->base + 2 * priv->regs[index]);
+       u16 reg;
+       unsigned long flags;
+       int retries;
+
+       retries = 0;
+
+       spin_lock_irqsave(&priv->testlock, flags);
+
+       do
+       {
+               iowrite32((u32)val, priv->base + 2 * priv->regs[index]);
+               reg = (u16)ioread32(priv->base + 2 * priv->regs[index]);
+               if (reg != val)
+               {
+                       netdev_err(priv->dev, "write 0x%x to offset 0x%x failed. got: 0x%x\n", val, 2 * priv->regs[index], reg);
+               }
+       } while ((reg != val) && (retries++ < 20));
+       spin_unlock_irqrestore(&priv->testlock, flags);
 }
 
 static void c_can_pci_reset_pch(const struct c_can_priv *priv, bool enable)


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-12-06 17:05                                                                                     ` Alexander Stein
@ 2012-12-06 22:02                                                                                       ` Wolfgang Grandegger
  2012-12-06 23:24                                                                                         ` Marc Kleine-Budde
                                                                                                           ` (2 more replies)
  0 siblings, 3 replies; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-12-06 22:02 UTC (permalink / raw)
  To: Alexander Stein; +Cc: Michael Pellegrini, linux-can

Hi Michael,

On 12/06/2012 06:05 PM, Alexander Stein wrote:
> Hi Michael,
> 
> On Thursday 06 December 2012 15:49:03, Wolfgang Grandegger wrote:
>>> Details of the soak test:
>>>
>>> There are two systems involved in the test: the PCH-System and an External Node.
>>> The External Node transmits data at a high rate, bringing bus utilization to
>>> ~35%.
>>> The PCH-System also transmits data, in bursts of 10 messages every 5 ms.
>>> Combined, the two systems utilize ~90% of bus bandwidth.
>>> The PCH-System is constantly checking that it is receiving data from the
>>> External Node at the expected rate and in the expected order.
> 
> So you do a lot of transmit and reception of CAN frames?
> 
>> On another thread Alexander is reporting problems with the same driver
>> when he runs a I2C application concurrently. Are you able to stress the
>> system in a similar way?
> 
> Could you please test with the following patch? Do you see error messages from this patch?
> Thanks!

To summarize my understanding of your problem(s). As long as there are
no I2C transfers, everything works fine, right? The patch below does
report some write-readback failures but that's due to reserved read-only
bits. I assume t hat you also use my "RFC v2" patches for c_can.

Trouble starts with concurrent I2C transfers. Then the protected
write-readback test fails, which I regard as abnormal hardware behavior,
resulting in message losses and out-of-order reception.

Would be interesting to compare the hardware. Michael, could you also
show the output of "lspci -vv".

Wolfgang.

> diff --git a/drivers/net/can/c_can/c_can.c b/drivers/net/can/c_can/c_can.c
> index d63aaa3..da9bbc0 100644
> --- a/drivers/net/can/c_can/c_can.c
> +++ b/drivers/net/can/c_can/c_can.c
> @@ -1186,6 +1186,7 @@ struct net_device *alloc_c_can_dev(void)
>                                         CAN_CTRLMODE_BERR_REPORTING;
>  
>         spin_lock_init(&priv->lock);
> +       spin_lock_init(&priv->testlock);
>  
>         return dev;
>  }
> diff --git a/drivers/net/can/c_can/c_can.h b/drivers/net/can/c_can/c_can.h
> index 3487d5e..2b58b75 100644
> --- a/drivers/net/can/c_can/c_can.h
> +++ b/drivers/net/can/c_can/c_can.h
> @@ -173,6 +173,7 @@ struct c_can_priv {
>         unsigned int instance;
>         void (*init) (const struct c_can_priv *priv, bool enable);
>         spinlock_t lock;        /* to protect tx and rx message objects */
> +       spinlock_t testlock;    /* to protect tx and rx message objects */
>  };
>  
>  struct net_device *alloc_c_can_dev(void);
> diff --git a/drivers/net/can/c_can/c_can_pci.c b/drivers/net/can/c_can/c_can_pci.c
> index 2516ea9..0ac4d43 100644
> --- a/drivers/net/can/c_can/c_can_pci.c
> +++ b/drivers/net/can/c_can/c_can_pci.c
> @@ -74,13 +74,37 @@ static void c_can_pci_write_reg_aligned_to_32bit(struct c_can_priv *priv,
>  static u16 c_can_pci_read_reg_32bit(struct c_can_priv *priv,
>                                     enum reg index)
>  {
> -       return (u16)ioread32(priv->base + 2 * priv->regs[index]);
> +       unsigned long flags;
> +       u16 reg;
> +
> +       spin_lock_irqsave(&priv->testlock, flags);
> +       reg = (u16)ioread32(priv->base + 2 * priv->regs[index]);
> +       spin_unlock_irqrestore(&priv->testlock, flags);
> +
> +       return reg;
>  }
>  
>  static void c_can_pci_write_reg_32bit(struct c_can_priv *priv,
>                                       enum reg index, u16 val)
>  {
> -       iowrite32((u32)val, priv->base + 2 * priv->regs[index]);
> +       u16 reg;
> +       unsigned long flags;
> +       int retries;
> +
> +       retries = 0;
> +
> +       spin_lock_irqsave(&priv->testlock, flags);
> +
> +       do
> +       {
> +               iowrite32((u32)val, priv->base + 2 * priv->regs[index]);
> +               reg = (u16)ioread32(priv->base + 2 * priv->regs[index]);
> +               if (reg != val)
> +               {
> +                       netdev_err(priv->dev, "write 0x%x to offset 0x%x failed. got: 0x%x\n", val, 2 * priv->regs[index], reg);
> +               }
> +       } while ((reg != val) && (retries++ < 20));
> +       spin_unlock_irqrestore(&priv->testlock, flags);
>  }
>  
>  static void c_can_pci_reset_pch(const struct c_can_priv *priv, bool enable)
> 
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-12-06 14:49                                                                                   ` Wolfgang Grandegger
  2012-12-06 17:05                                                                                     ` Alexander Stein
@ 2012-12-06 22:11                                                                                     ` Michael Pellegrini
  2012-12-06 23:23                                                                                       ` Michael Pellegrini
  1 sibling, 1 reply; 79+ messages in thread
From: Michael Pellegrini @ 2012-12-06 22:11 UTC (permalink / raw)
  To: linux-can

Wolfgang Grandegger <wg <at> grandegger.com> writes:

> On another thread Alexander is reporting problems with the same driver
> when he runs a I2C application concurrently. Are you able to stress the
> system in a similar way?
> 
> Thanks a lot for testing.

I don't have an I2C application on-hand, let me see what I can whip up.

- Mike




^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-12-06 22:11                                                                                     ` Michael Pellegrini
@ 2012-12-06 23:23                                                                                       ` Michael Pellegrini
  0 siblings, 0 replies; 79+ messages in thread
From: Michael Pellegrini @ 2012-12-06 23:23 UTC (permalink / raw)
  To: linux-can

Michael Pellegrini <mikep86 <at> gmail.com> writes:

> I don't have an I2C application on-hand, let me see what I can whip up.

I tried loading the i2c-dev module and no i2c devices came up in /dev or
/sys/class/i2c-dev.  I am using the in-tree version of the driver.

In case it matters, my system automatically loaded the i2c_isch driver, but
not the i2c-dev driver.  I had to modprobe the latter manually.

I tried unloading both drivers, then loading just i2c-dev.  Still no i2c
devices.

Where should I go from here?

Thanks,
Mike


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-12-06 22:02                                                                                       ` Wolfgang Grandegger
@ 2012-12-06 23:24                                                                                         ` Marc Kleine-Budde
  2012-12-10  8:21                                                                                         ` Alexander Stein
  2012-12-11 14:46                                                                                         ` Michael Pellegrini
  2 siblings, 0 replies; 79+ messages in thread
From: Marc Kleine-Budde @ 2012-12-06 23:24 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: Alexander Stein, Michael Pellegrini, linux-can

[-- Attachment #1: Type: text/plain, Size: 2125 bytes --]

On 12/06/2012 11:02 PM, Wolfgang Grandegger wrote:
> Hi Michael,
> 
> On 12/06/2012 06:05 PM, Alexander Stein wrote:
>> Hi Michael,
>> 
>> On Thursday 06 December 2012 15:49:03, Wolfgang Grandegger wrote:
>>>> Details of the soak test:
>>>> 
>>>> There are two systems involved in the test: the PCH-System and
>>>> an External Node. The External Node transmits data at a high
>>>> rate, bringing bus utilization to ~35%. The PCH-System also
>>>> transmits data, in bursts of 10 messages every 5 ms. Combined,
>>>> the two systems utilize ~90% of bus bandwidth. The PCH-System
>>>> is constantly checking that it is receiving data from the 
>>>> External Node at the expected rate and in the expected order.
>> 
>> So you do a lot of transmit and reception of CAN frames?
>> 
>>> On another thread Alexander is reporting problems with the same
>>> driver when he runs a I2C application concurrently. Are you able
>>> to stress the system in a similar way?
>> 
>> Could you please test with the following patch? Do you see error
>> messages from this patch? Thanks!
> 
> To summarize my understanding of your problem(s). As long as there
> are no I2C transfers, everything works fine, right? The patch below
> does report some write-readback failures but that's due to reserved
> read-only bits. I assume t hat you also use my "RFC v2" patches for
> c_can.
> 
> Trouble starts with concurrent I2C transfers. Then the protected 
> write-readback test fails, which I regard as abnormal hardware
> behavior, resulting in message losses and out-of-order reception.

Out-of-sequence reception might be a problem due to a problem in the rx
algorithm. And the drops are probably due to overflowing rx mailboxes.
The out-of-sequence will most mist likely occur if the rx fifo runs
full, but does not overflow.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 261 bytes --]

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-12-06 22:02                                                                                       ` Wolfgang Grandegger
  2012-12-06 23:24                                                                                         ` Marc Kleine-Budde
@ 2012-12-10  8:21                                                                                         ` Alexander Stein
  2012-12-11 20:24                                                                                           ` Wolfgang Grandegger
  2012-12-11 14:46                                                                                         ` Michael Pellegrini
  2 siblings, 1 reply; 79+ messages in thread
From: Alexander Stein @ 2012-12-10  8:21 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: Michael Pellegrini, linux-can

Hello Wolfgang,

I expect you meant me.

On Thursday 06 December 2012 23:02:50, Wolfgang Grandegger wrote:
> Hi Michael,
> 
> On 12/06/2012 06:05 PM, Alexander Stein wrote:
> > Hi Michael,
> > 
> > On Thursday 06 December 2012 15:49:03, Wolfgang Grandegger wrote:
> >>> Details of the soak test:
> >>>
> >>> There are two systems involved in the test: the PCH-System and an External Node.
> >>> The External Node transmits data at a high rate, bringing bus utilization to
> >>> ~35%.
> >>> The PCH-System also transmits data, in bursts of 10 messages every 5 ms.
> >>> Combined, the two systems utilize ~90% of bus bandwidth.
> >>> The PCH-System is constantly checking that it is receiving data from the
> >>> External Node at the expected rate and in the expected order.
> > 
> > So you do a lot of transmit and reception of CAN frames?
> > 
> >> On another thread Alexander is reporting problems with the same driver
> >> when he runs a I2C application concurrently. Are you able to stress the
> >> system in a similar way?
> > 
> > Could you please test with the following patch? Do you see error messages from this patch?
> > Thanks!
> 
> To summarize my understanding of your problem(s). As long as there are
> no I2C transfers, everything works fine, right? The patch below does
> report some write-readback failures but that's due to reserved read-only
> bits. I assume t hat you also use my "RFC v2" patches for c_can.

Yes, I think so, here is the list of patches I cherry-picked or picked from the ML. The first one is the patch I posted on the ML:
# git logone v3.0.31.. drivers/net/can/c_can
eca55a90f1b459412fe6a06ade04168953b0cc0a c_can_pci: check writes in c_can_pci_write_reg_32bit
b30cd6e97e33c18b302acc069d4306976640005d c_can: add spinlock to protect tx and rx objects
8c0da92b71d15384e2e10b42eb9fee1d7566c91a c_can_pci: add support for PCH CAN on Intel EG20T PCH
3baeb05d514ae29959fccf57ef1d25d4e405ea2a c_can_pci: enable PCI bus master only for MSI
3bfe69aa4755e55067fb3100889557fb6784f5aa c_can_pci: introduce board specific PCI bar
ccb01456b3776d89d01f240ea4ca781139b8ca1f c_can: use different sets of interface registers for rx and tx
7369cf2ce4afeea3a4e9440ce057fc8ac0781bae c_can: rename callback "initram" to "init" to more general usage
8abbf3fafbca7bcedd7e63261918d98c2aa7b5b2 can: c_can_platform: add MODULE_DEVICE_TABLE
cf565c2f35b6df83cc2d9b1746aa78ed95a1c564 can: c_can: Add d_can raminit support
1ecf42b14e6acac66340347c59b15395ebdae8d2 can: c_can: fix segfault during rmmod
7031adc8ad27677f9ca53f41aeecc561e2855a9e can: c_can: Adopt pinctrl support
c461df9e2bb51973b2290d74428a9e86d8a832f4 can: c_can: Add d_can suspend resume support
f76251a1154c2fe2d485c2bc0e6bb319f95c89ef can: c_can: Add runtime PM support to Bosch C_CAN/D_CAN controller
c8eb3d0dad64123fc758518a12907293d68b64a2 can: c_can: Add device tree support to Bosch C_CAN/D_CAN controller
645820279b5dc4b2f6137267f779592caed225c5 can: c_can: Modify c_can device names
833a73a775d15141651eac682c958041ad74b6d5 net/c_can: remove conditional compilation of clk code
f76327017fa0ee6048322a603738849fde8b0fee can: c_can_pci: fix compilation on non HAVE_CLK archs
e8a58d604a51bf4a2b8a46999ad7615d6c93ee85 c_can_pci: generic module for C_CAN/D_CAN on PCI
0621d4c54a9451df25f5c26bedd64cdaafca2fbc can: c_can: precedence error in c_can_chip_config()
e0c82e969269124ac47f75c0dd44e59f63845d02 can: c_can: Add support for Bosch D_CAN controller
5957e31284f50c7af4537ebfb45659c42afaa112 can: c_can: Move overlay structure to array with offset as index
374b3b34644553d03c4f1714c2a14d8810af1c68 can: c_can: fix race condition in c_can_open()
a2101117c4edfd9ec6fb059094ab74e4235da2e3 can: c_can: fix an interrupt thrash issue with c_can driver
20db935eff7b0ff4f1842f4de1f4e7f946d313dc can: c_can: fix "BUG! echo_skb is occupied!" during transmit
756f86a219db885c7b5aacf70ddefe961aad118a can: c_can: remove duplicated #include

> Trouble starts with concurrent I2C transfers. Then the protected
> write-readback test fails, which I regard as abnormal hardware behavior,
> resulting in message losses and out-of-order reception.

I cannot say if any (small) I2C transfer at all raises the problem. I run 'cangen -I 0x300 can0' on my PC connected to my test board. A I2C connected LED is triggered by heartbeat thus there is a small I2C traffic each second. I couldn't see any errors in dmesg in about 10 minutes.
But even with that small CAN traffic (next to nothing) a 'watch sensors' (which queries several I2C sensors every 2s) caused errors in dmesg. It seems the problem isn't related to CAN bus load at all.

Best regards,
Alexander


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-12-06 22:02                                                                                       ` Wolfgang Grandegger
  2012-12-06 23:24                                                                                         ` Marc Kleine-Budde
  2012-12-10  8:21                                                                                         ` Alexander Stein
@ 2012-12-11 14:46                                                                                         ` Michael Pellegrini
  2012-12-11 20:21                                                                                           ` Wolfgang Grandegger
  2012-12-12 13:35                                                                                           ` Alexander Stein
  2 siblings, 2 replies; 79+ messages in thread
From: Michael Pellegrini @ 2012-12-11 14:46 UTC (permalink / raw)
  To: linux-can

Wolfgang Grandegger <wg <at> grandegger.com> writes:

> To summarize my understanding of your problem(s). As long as there are
> no I2C transfers, everything works fine, right? The patch below does
> report some write-readback failures but that's due to reserved read-only
> bits. I assume t hat you also use my "RFC v2" patches for c_can.
> 
> Trouble starts with concurrent I2C transfers. Then the protected
> write-readback test fails, which I regard as abnormal hardware behavior,
> resulting in message losses and out-of-order reception.
> 
> Would be interesting to compare the hardware. Michael, could you also
> show the output of "lspci -vv".

I applied the patch provided.  On driver load, dmesg reports the following:

[  379.817717] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.817733] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.817856] c_can_pci 0000:02:0c.3: can0: obj no:6, msgval:0x0000003f
[  379.817875] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.817891] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.817907] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.817923] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.817939] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.817955] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.817971] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.817987] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818003] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818019] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818035] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818051] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818067] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818082] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818098] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818114] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818130] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818146] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818162] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818178] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818194] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818228] c_can_pci 0000:02:0c.3: can0: obj no:7, msgval:0x0000007f
[  379.818246] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818262] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818278] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818294] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818310] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818326] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818342] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818358] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818374] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818390] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818406] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818422] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818437] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818453] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818469] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818485] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818501] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818517] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818533] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818549] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818565] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818599] c_can_pci 0000:02:0c.3: can0: obj no:8, msgval:0x000000ff
[  379.818617] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818634] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818650] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818665] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818679] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818694] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818709] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818724] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818739] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818754] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818769] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818784] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818799] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818814] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818828] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818844] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818858] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818873] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818888] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818903] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818918] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818951] c_can_pci 0000:02:0c.3: can0: obj no:9, msgval:0x000001ff
[  379.818968] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818983] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.818998] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819013] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819028] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819043] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819058] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819073] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819088] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819103] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819118] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819133] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819148] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819163] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819178] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819193] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819208] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819223] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819238] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819253] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819268] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819301] c_can_pci 0000:02:0c.3: can0: obj no:10, msgval:0x000003ff
[  379.819318] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819333] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819348] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819362] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819377] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819393] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819409] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819426] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819442] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819457] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819473] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819488] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819504] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819519] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819534] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819550] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819565] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819581] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819596] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819612] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819627] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819661] c_can_pci 0000:02:0c.3: can0: obj no:11, msgval:0x000007ff
[  379.819678] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819694] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819710] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819726] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819742] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819758] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819774] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819790] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819806] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819822] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819838] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819854] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819869] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819885] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819901] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819917] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819933] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819949] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819965] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819981] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.819997] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820075] c_can_pci 0000:02:0c.3: can0: obj no:12, msgval:0x00000fff
[  379.820093] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820110] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820125] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820142] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820157] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820173] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820189] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820205] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820221] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820237] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820252] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820268] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820284] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820300] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820316] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820332] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820348] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820364] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820380] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820396] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820412] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820446] c_can_pci 0000:02:0c.3: can0: obj no:13, msgval:0x00001fff
[  379.820464] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820480] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820496] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820512] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820528] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820544] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820560] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820576] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820591] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820607] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820623] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820639] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820655] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820671] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820687] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820703] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820719] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820734] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820750] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820766] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820782] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820817] c_can_pci 0000:02:0c.3: can0: obj no:14, msgval:0x00003fff
[  379.820835] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820851] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820867] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820883] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820899] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820915] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820930] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820946] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820962] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820978] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.820994] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821010] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821026] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821042] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821058] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821073] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821089] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821105] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821121] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821137] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821153] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821187] c_can_pci 0000:02:0c.3: can0: obj no:15, msgval:0x00007fff
[  379.821205] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821221] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821237] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821253] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821269] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821285] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821301] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821317] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821332] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821348] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821364] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821380] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821397] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821413] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821428] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821445] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821460] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821475] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821490] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821505] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821520] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
got: 0x2000
[  379.821553] c_can_pci 0000:02:0c.3: can0: obj no:16, msgval:0x0000ffff
[  379.821568] c_can_pci 0000:02:0c.3: can0: setting BTR=0518 BRPE=0000

"lspci -vv" reports:

02:0c.2 Serial bus controller [0c80]: Intel Corporation Platform Controller Hub 
EG20T I2C Controller
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
<MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin C routed to IRQ 7
        Region 1: Memory at d0144000 (32-bit, non-prefetchable) [size=256]
        Capabilities: [40] MSI: Enable- Count=1/1 Maskable- 64bit-
                Address: 00000000  Data: 0000
        Capabilities: [50] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-
,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-

02:0c.3 CANBUS: Intel Corporation Platform Controller Hub EG20T Controller Area 
Network (CAN) Controller
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
<MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin C routed to IRQ 42
        Region 1: Memory at d0143000 (32-bit, non-prefetchable) [size=512]
        Capabilities: [40] MSI: Enable+ Count=1/1 Maskable- 64bit-
                Address: fee0300c  Data: 4179
        Capabilities: [50] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-
,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Kernel driver in use: c_can_pci
        Kernel modules: c_can_pci

Based on a quick test, the CAN interface appears functional with this driver.
However, I am unable to test the I2C interface as I can't get I2C devices to
appear in /dev right now.

I apologize for the delay in getting these results back to you.

- Mike


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-12-11 14:46                                                                                         ` Michael Pellegrini
@ 2012-12-11 20:21                                                                                           ` Wolfgang Grandegger
  2012-12-12 13:35                                                                                           ` Alexander Stein
  1 sibling, 0 replies; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-12-11 20:21 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can

On 12/11/2012 03:46 PM, Michael Pellegrini wrote:
> Wolfgang Grandegger <wg <at> grandegger.com> writes:
> 
>> To summarize my understanding of your problem(s). As long as there are
>> no I2C transfers, everything works fine, right? The patch below does
>> report some write-readback failures but that's due to reserved read-only
>> bits. I assume t hat you also use my "RFC v2" patches for c_can.
>>
>> Trouble starts with concurrent I2C transfers. Then the protected
>> write-readback test fails, which I regard as abnormal hardware behavior,
>> resulting in message losses and out-of-order reception.
>>
>> Would be interesting to compare the hardware. Michael, could you also
>> show the output of "lspci -vv".
> 
> I applied the patch provided.  On driver load, dmesg reports the following:
> 
> [  379.817717] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
> got: 0x2000

This is normal because the register returns always that *read-only* bit.
Nothing to worry about. You seem not to observe the serious
write-readback errors Alexander reported.

> "lspci -vv" reports:
> 
> 02:0c.2 Serial bus controller [0c80]: Intel Corporation Platform Controller Hub 
> EG20T I2C Controller
>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
> Stepping- SERR- FastB2B- DisINTx-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
> <MAbort- >SERR- <PERR- INTx-
>         Latency: 0
>         Interrupt: pin C routed to IRQ 7
>         Region 1: Memory at d0144000 (32-bit, non-prefetchable) [size=256]
>         Capabilities: [40] MSI: Enable- Count=1/1 Maskable- 64bit-
>                 Address: 00000000  Data: 0000
>         Capabilities: [50] Power Management version 2
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-
> ,D3hot-,D3cold-)
>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
> 
> 02:0c.3 CANBUS: Intel Corporation Platform Controller Hub EG20T Controller Area 
> Network (CAN) Controller
>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
> Stepping- SERR- FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
> <MAbort- >SERR- <PERR- INTx-
>         Latency: 0
>         Interrupt: pin C routed to IRQ 42
>         Region 1: Memory at d0143000 (32-bit, non-prefetchable) [size=512]
>         Capabilities: [40] MSI: Enable+ Count=1/1 Maskable- 64bit-
>                 Address: fee0300c  Data: 4179
>         Capabilities: [50] Power Management version 2
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-
> ,D3hot-,D3cold-)
>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>         Kernel driver in use: c_can_pci
>         Kernel modules: c_can_pci
> 
> Based on a quick test, the CAN interface appears functional with this driver.
> However, I am unable to test the I2C interface as I can't get I2C devices to
> appear in /dev right now.
> 
> I apologize for the delay in getting these results back to you.

OK, anyway, thanks for testing.

Wolfgang.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-12-10  8:21                                                                                         ` Alexander Stein
@ 2012-12-11 20:24                                                                                           ` Wolfgang Grandegger
  2012-12-13 14:04                                                                                             ` Alexander Stein
  0 siblings, 1 reply; 79+ messages in thread
From: Wolfgang Grandegger @ 2012-12-11 20:24 UTC (permalink / raw)
  To: Alexander Stein; +Cc: Michael Pellegrini, linux-can

On 12/10/2012 09:21 AM, Alexander Stein wrote:
> Hello Wolfgang,
> 
> I expect you meant me.
> 
> On Thursday 06 December 2012 23:02:50, Wolfgang Grandegger wrote:
>> Hi Michael,
>>
>> On 12/06/2012 06:05 PM, Alexander Stein wrote:
>>> Hi Michael,
>>>
>>> On Thursday 06 December 2012 15:49:03, Wolfgang Grandegger wrote:
>>>>> Details of the soak test:
>>>>>
>>>>> There are two systems involved in the test: the PCH-System and an External Node.
>>>>> The External Node transmits data at a high rate, bringing bus utilization to
>>>>> ~35%.
>>>>> The PCH-System also transmits data, in bursts of 10 messages every 5 ms.
>>>>> Combined, the two systems utilize ~90% of bus bandwidth.
>>>>> The PCH-System is constantly checking that it is receiving data from the
>>>>> External Node at the expected rate and in the expected order.
>>>
>>> So you do a lot of transmit and reception of CAN frames?
>>>
>>>> On another thread Alexander is reporting problems with the same driver
>>>> when he runs a I2C application concurrently. Are you able to stress the
>>>> system in a similar way?
>>>
>>> Could you please test with the following patch? Do you see error messages from this patch?
>>> Thanks!
>>
>> To summarize my understanding of your problem(s). As long as there are
>> no I2C transfers, everything works fine, right? The patch below does
>> report some write-readback failures but that's due to reserved read-only
>> bits. I assume t hat you also use my "RFC v2" patches for c_can.
> 
> Yes, I think so, here is the list of patches I cherry-picked or picked from the ML. The first one is the patch I posted on the ML:
> # git logone v3.0.31.. drivers/net/can/c_can
> eca55a90f1b459412fe6a06ade04168953b0cc0a c_can_pci: check writes in c_can_pci_write_reg_32bit
> b30cd6e97e33c18b302acc069d4306976640005d c_can: add spinlock to protect tx and rx objects
> 8c0da92b71d15384e2e10b42eb9fee1d7566c91a c_can_pci: add support for PCH CAN on Intel EG20T PCH
> 3baeb05d514ae29959fccf57ef1d25d4e405ea2a c_can_pci: enable PCI bus master only for MSI
> 3bfe69aa4755e55067fb3100889557fb6784f5aa c_can_pci: introduce board specific PCI bar
> ccb01456b3776d89d01f240ea4ca781139b8ca1f c_can: use different sets of interface registers for rx and tx
> 7369cf2ce4afeea3a4e9440ce057fc8ac0781bae c_can: rename callback "initram" to "init" to more general usage
> 8abbf3fafbca7bcedd7e63261918d98c2aa7b5b2 can: c_can_platform: add MODULE_DEVICE_TABLE
> cf565c2f35b6df83cc2d9b1746aa78ed95a1c564 can: c_can: Add d_can raminit support
> 1ecf42b14e6acac66340347c59b15395ebdae8d2 can: c_can: fix segfault during rmmod
> 7031adc8ad27677f9ca53f41aeecc561e2855a9e can: c_can: Adopt pinctrl support
> c461df9e2bb51973b2290d74428a9e86d8a832f4 can: c_can: Add d_can suspend resume support
> f76251a1154c2fe2d485c2bc0e6bb319f95c89ef can: c_can: Add runtime PM support to Bosch C_CAN/D_CAN controller
> c8eb3d0dad64123fc758518a12907293d68b64a2 can: c_can: Add device tree support to Bosch C_CAN/D_CAN controller
> 645820279b5dc4b2f6137267f779592caed225c5 can: c_can: Modify c_can device names
> 833a73a775d15141651eac682c958041ad74b6d5 net/c_can: remove conditional compilation of clk code
> f76327017fa0ee6048322a603738849fde8b0fee can: c_can_pci: fix compilation on non HAVE_CLK archs
> e8a58d604a51bf4a2b8a46999ad7615d6c93ee85 c_can_pci: generic module for C_CAN/D_CAN on PCI
> 0621d4c54a9451df25f5c26bedd64cdaafca2fbc can: c_can: precedence error in c_can_chip_config()
> e0c82e969269124ac47f75c0dd44e59f63845d02 can: c_can: Add support for Bosch D_CAN controller
> 5957e31284f50c7af4537ebfb45659c42afaa112 can: c_can: Move overlay structure to array with offset as index
> 374b3b34644553d03c4f1714c2a14d8810af1c68 can: c_can: fix race condition in c_can_open()
> a2101117c4edfd9ec6fb059094ab74e4235da2e3 can: c_can: fix an interrupt thrash issue with c_can driver
> 20db935eff7b0ff4f1842f4de1f4e7f946d313dc can: c_can: fix "BUG! echo_skb is occupied!" during transmit
> 756f86a219db885c7b5aacf70ddefe961aad118a can: c_can: remove duplicated #include

Oops, that's a rather long lisst of fixes and improvements.

> 
>> Trouble starts with concurrent I2C transfers. Then the protected
>> write-readback test fails, which I regard as abnormal hardware behavior,
>> resulting in message losses and out-of-order reception.
> 
> I cannot say if any (small) I2C transfer at all raises the problem. I run 'cangen -I 0x300 can0' on my PC connected to my test board. A I2C connected LED is triggered by heartbeat thus there is a small I2C traffic each second. I couldn't see any errors in dmesg in about 10 minutes.
> But even with that small CAN traffic (next to nothing) a 'watch sensors' (which queries several I2C sensors every 2s) caused errors in dmesg. It seems the problem isn't related to CAN bus load at all.

Yes, that's also my impression. Most likely it's a problem on the PCI
bus with caching or concurrent access. Any chance to use a more recent
version of the Linux kernel?

Wolfgang.


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-12-11 14:46                                                                                         ` Michael Pellegrini
  2012-12-11 20:21                                                                                           ` Wolfgang Grandegger
@ 2012-12-12 13:35                                                                                           ` Alexander Stein
  1 sibling, 0 replies; 79+ messages in thread
From: Alexander Stein @ 2012-12-12 13:35 UTC (permalink / raw)
  To: Michael Pellegrini; +Cc: linux-can

Hello Michael,

On Tuesday 11 December 2012 14:46:37, Michael Pellegrini wrote:
> I applied the patch provided.  On driver load, dmesg reports the following:
> 
> [  379.817717] c_can_pci 0000:02:0c.3: can0: write 0x0 to offset 0x2c failed. 
> got: 0x2000
> [...]

This should be expected due to the reserved bit set as 1. But I get the same behavior if there is no i2c traffic: CAN works fine. I'm just losing some messages at some time which is IMO related to the algorithm.

> Based on a quick test, the CAN interface appears functional with this driver.
> However, I am unable to test the I2C interface as I can't get I2C devices to
> appear in /dev right now.

An /dev/i2c-0 (or -1 or whatever, depending on your bus number ordering) should be enough to use things as i2cdetect. On the other hand if you have hwmon sensors they can be queried from within /sys/class/hwmon/hwmonX/

Best regards,
Alexander


^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: pch_can: Data transmission stops after dropped packet
  2012-12-11 20:24                                                                                           ` Wolfgang Grandegger
@ 2012-12-13 14:04                                                                                             ` Alexander Stein
  0 siblings, 0 replies; 79+ messages in thread
From: Alexander Stein @ 2012-12-13 14:04 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: Michael Pellegrini, linux-can

Hello Wolfgang,

On Tuesday 11 December 2012 21:24:57, Wolfgang Grandegger wrote:
> > I cannot say if any (small) I2C transfer at all raises the problem. I run 'cangen -I 0x300 can0' on my PC connected to my test board. A I2C connected LED is triggered by heartbeat thus there is a small I2C traffic each second. I couldn't see any errors in dmesg in about 10 minutes.
> > But even with that small CAN traffic (next to nothing) a 'watch sensors' (which queries several I2C sensors every 2s) caused errors in dmesg. It seems the problem isn't related to CAN bus load at all.
> 
> Yes, that's also my impression. Most likely it's a problem on the PCI
> bus with caching or concurrent access. Any chance to use a more recent
> version of the Linux kernel?

I tried your v3 patchset based on 6be35c700f742e911ecedd07fcc43d4439922334 (which is next-next/master being merged into Linus' master branch) and it got even "worse". Which actually means I even get PCI write errors if there is no I2C traffic at all (compared the interrupt before and after the test). In the end it's the same error in each case:
[  321.702036] c_can_pci 0000:02:0c.3 can0: write 0xe to offset 0x0 failed. got: 0x0
[  322.630034] c_can_pci 0000:02:0c.3 can0: write 0xe to offset 0x0 failed. got: 0x0
[  350.026035] c_can_pci 0000:02:0c.3 can0: write 0xe to offset 0x0 failed. got: 0x0
[  354.932033] c_can_pci 0000:02:0c.3 can0: write 0xe to offset 0x0 failed. got: 0x0
[  374.812036] c_can_pci 0000:02:0c.3 can0: write 0xe to offset 0x0 failed. got: 0x0
[  378.099034] c_can_pci 0000:02:0c.3 can0: write 0xe to offset 0x0 failed. got: 0x0
[  386.068034] c_can_pci 0000:02:0c.3 can0: write 0xe to offset 0x0 failed. got: 0x0
[  399.639034] c_can_pci 0000:02:0c.3 can0: write 0xe to offset 0x0 failed. got: 0x0
[  401.034033] c_can_pci 0000:02:0c.3 can0: write 0xe to offset 0x0 failed. got: 0x0
[  410.143034] c_can_pci 0000:02:0c.3 can0: write 0xe to offset 0x0 failed. got: 0x0
[  415.082034] c_can_pci 0000:02:0c.3 can0: write 0xe to offset 0x0 failed. got: 0x0
[  418.593033] c_can_pci 0000:02:0c.3 can0: write 0xe to offset 0x0 failed. got: 0x0
[  439.871035] c_can_pci 0000:02:0c.3 can0: write 0xe to offset 0x0 failed. got: 0x0
[  564.614037] c_can_pci 0000:02:0c.3 can0: write 0xe to offset 0x0 failed. got: 0x0
[  586.593035] c_can_pci 0000:02:0c.3 can0: write 0xe to offset 0x0 failed. got: 0x0

I compared the output of lspci -vv from v3.0.31+ and v3.7+ and there are only  few changes which, I suspect, should not have any influence:
* different MSI Data register content
* PCI bridges have a bigger prefetchable memory range
* driver description for HDA audio changed
* Address of the expansion ROM from the external ethernet controller changed (same as the address at the bridges)
* "Kernel driver in use" ehci_hcd and ehci-pci for USB reported by lspci have exchanged

All in all nothing which should have influence.

Best regards,
Alexander


^ permalink raw reply	[flat|nested] 79+ messages in thread

end of thread, other threads:[~2012-12-13 14:04 UTC | newest]

Thread overview: 79+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-14 15:39 pch_can: Data transmission stops after dropped packet Michael Pellegrini
2012-11-14 21:40 ` Michael Pellegrini
2012-11-15  7:18 ` Oliver Hartkopp
2012-11-15 13:13   ` Wolfgang Grandegger
2012-11-15 16:23     ` Michael Pellegrini
2012-11-15 21:19       ` Wolfgang Grandegger
2012-11-15 21:34         ` Michael Pellegrini
2012-11-15 21:51           ` Wolfgang Grandegger
2012-11-18 22:22           ` Wolfgang Grandegger
2012-11-19 15:10             ` Michael Pellegrini
2012-11-19 15:26               ` Wolfgang Grandegger
2012-11-19 16:20                 ` Michael Pellegrini
2012-11-19 16:31                   ` Wolfgang Grandegger
2012-11-19 17:39                     ` Michael Pellegrini
2012-11-19 19:22                       ` Wolfgang Grandegger
2012-11-19 20:19                         ` Michael Pellegrini
2012-11-19 21:46                           ` Wolfgang Grandegger
2012-11-20 14:25                             ` Michael Pellegrini
2012-11-20 16:12                               ` Wolfgang Grandegger
2012-11-20 19:12                                 ` Michael Pellegrini
2012-11-20 21:05                                   ` Wolfgang Grandegger
2012-11-21 10:24                                     ` Wolfgang Grandegger
     [not found]                                       ` <loom.20121121T160744-278@post.gmane.or  g>
2012-11-21 15:15                                       ` Michael Pellegrini
     [not found]                                       ` <loom.20121121T160744-278@post.gmane.or g>
2012-11-21 15:25                                         ` Michael Pellegrini
2012-11-21 15:32                                           ` Marc Kleine-Budde
2012-11-21 16:11                                             ` Michael Pellegrini
2012-11-21 15:41                                           ` Michael Pellegrini
2012-11-21 15:56                                           ` Wolfgang Grandegger
2012-11-21 16:09                                             ` Michael Pellegrini
2012-11-21 16:41                                               ` Wolfgang Grandegger
2012-11-21 16:58                                                 ` Casper Mogensen
2012-11-21 19:48                                                   ` Wolfgang Grandegger
2012-11-21 17:43                                                 ` Michael Pellegrini
2012-11-21 19:55                                                   ` Wolfgang Grandegger
2012-11-21 21:00                                                     ` Michael Pellegrini
2012-11-23 14:27                                                       ` Michael Pellegrini
2012-11-23 14:45                                                         ` Wolfgang Grandegger
2012-11-23 14:47                                                           ` Wolfgang Grandegger
2012-11-23 15:14                                                             ` Michael Pellegrini
2012-11-23 15:04                                                           ` Michael Pellegrini
2012-11-23 17:00                                                             ` Wolfgang Grandegger
2012-11-23 17:18                                                               ` Wolfgang Grandegger
2012-11-23 17:52                                                                 ` Michael Pellegrini
2012-11-25 16:17                                                                   ` Wolfgang Grandegger
2012-11-26 14:54                                                                     ` Michael Pellegrini
2012-11-26 15:30                                                                       ` Wolfgang Grandegger
2012-11-26 17:30                                                                         ` Michael Pellegrini
2012-11-26 18:13                                                                           ` Wolfgang Grandegger
2012-11-29 12:15                                                                             ` Wolfgang Grandegger
2012-11-29 14:15                                                                               ` Michael Pellegrini
2012-12-06 14:20                                                                                 ` Michael Pellegrini
2012-12-06 14:23                                                                                   ` Marc Kleine-Budde
2012-12-06 14:41                                                                                     ` Wolfgang Grandegger
2012-12-06 14:42                                                                                       ` Marc Kleine-Budde
2012-12-06 14:42                                                                                     ` Michael Pellegrini
2012-12-06 14:49                                                                                   ` Wolfgang Grandegger
2012-12-06 17:05                                                                                     ` Alexander Stein
2012-12-06 22:02                                                                                       ` Wolfgang Grandegger
2012-12-06 23:24                                                                                         ` Marc Kleine-Budde
2012-12-10  8:21                                                                                         ` Alexander Stein
2012-12-11 20:24                                                                                           ` Wolfgang Grandegger
2012-12-13 14:04                                                                                             ` Alexander Stein
2012-12-11 14:46                                                                                         ` Michael Pellegrini
2012-12-11 20:21                                                                                           ` Wolfgang Grandegger
2012-12-12 13:35                                                                                           ` Alexander Stein
2012-12-06 22:11                                                                                     ` Michael Pellegrini
2012-12-06 23:23                                                                                       ` Michael Pellegrini
2012-11-24  7:16                                                               ` Wolfgang Grandegger
2012-11-26  3:33                                                                 ` Bhupesh SHARMA
2012-11-21 14:52                                     ` Michael Pellegrini
2012-11-21 15:02                                       ` Wolfgang Grandegger
2012-11-15 16:32     ` Casper Mogensen
2012-11-15 21:16       ` Wolfgang Grandegger
2012-11-16 19:39     ` Wolfgang Grandegger
2012-11-15 16:12   ` Michael Pellegrini
2012-11-20 18:59     ` Wolfgang Grandegger
2012-11-15 12:35 ` Steffen Rose
2012-11-15 18:26   ` Michael Pellegrini
2012-11-16  8:24     ` Steffen Rose

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.