All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RESEND] can: j1939: do not wait 250ms if the same addr was already claimed
@ 2022-05-09 17:03 Devid Antonio Filoni
  2022-05-09 19:04 ` Kurt Van Dijck
  0 siblings, 1 reply; 28+ messages in thread
From: Devid Antonio Filoni @ 2022-05-09 17:03 UTC (permalink / raw)
  To: Robin van der Gracht, Oleksij Rempel
  Cc: kernel, linux-can, Oleksij Rempel, Oliver Hartkopp,
	Marc Kleine-Budde, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Maxime Jayat, kbuild test robot, netdev, linux-kernel,
	Devid Antonio Filoni

This is not explicitly stated in SAE J1939-21 and some tools used for
ISO-11783 certification do not expect this wait.

Fixes: 9d71dd0 ("can: add support of SAE J1939 protocol")
Signed-off-by: Devid Antonio Filoni <devid.filoni@egluetechnologies.com>
---
 net/can/j1939/address-claim.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/can/j1939/address-claim.c b/net/can/j1939/address-claim.c
index f33c47327927..1d070c08edf1 100644
--- a/net/can/j1939/address-claim.c
+++ b/net/can/j1939/address-claim.c
@@ -165,6 +165,12 @@ static void j1939_ac_process(struct j1939_priv *priv, struct sk_buff *skb)
 	 * leaving this function.
 	 */
 	ecu = j1939_ecu_get_by_name_locked(priv, name);
+
+	if (ecu && ecu->addr == skcb->addr.sa) {
+		/* the address was already claimed with the same name, nothing to do */
+		goto out_ecu_put;
+	}
+
 	if (!ecu && j1939_address_is_unicast(skcb->addr.sa))
 		ecu = j1939_ecu_create_locked(priv, name);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH RESEND] can: j1939: do not wait 250ms if the same addr was already claimed
  2022-05-09 17:03 [PATCH RESEND] can: j1939: do not wait 250ms if the same addr was already claimed Devid Antonio Filoni
@ 2022-05-09 19:04 ` Kurt Van Dijck
  2022-05-10  4:26   ` Oleksij Rempel
  0 siblings, 1 reply; 28+ messages in thread
From: Kurt Van Dijck @ 2022-05-09 19:04 UTC (permalink / raw)
  To: Devid Antonio Filoni
  Cc: Robin van der Gracht, Oleksij Rempel, kernel, linux-can,
	Oleksij Rempel, Oliver Hartkopp, Marc Kleine-Budde,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Maxime Jayat,
	kbuild test robot, netdev, linux-kernel

On ma, 09 mei 2022 19:03:03 +0200, Devid Antonio Filoni wrote:
> This is not explicitly stated in SAE J1939-21 and some tools used for
> ISO-11783 certification do not expect this wait.

IMHO, the current behaviour is not explicitely stated, but nor is the opposite.
And if I'm not mistaken, this introduces a 250msec delay.

1. If you want to avoid the 250msec gap, you should avoid to contest the same address.

2. It's a balance between predictability and flexibility, but if you try to accomplish both,
as your patch suggests, there is slight time-window until the current owner responds,
in which it may be confusing which node has the address. It depends on how much history
you have collected on the bus.

I'm sure that this problem decreases with increasing processing power on the nodes,
but bigger internal queues also increase this window.

It would certainly help if you describe how the current implementation fails.

Would decreasing the dead time to 50msec help in such case.

Kind regards,
Kurt

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RESEND] can: j1939: do not wait 250ms if the same addr was already claimed
  2022-05-09 19:04 ` Kurt Van Dijck
@ 2022-05-10  4:26   ` Oleksij Rempel
  2022-05-10 11:00     ` Devid Antonio Filoni
  0 siblings, 1 reply; 28+ messages in thread
From: Oleksij Rempel @ 2022-05-10  4:26 UTC (permalink / raw)
  To: Devid Antonio Filoni, Robin van der Gracht, kernel, linux-can,
	Oleksij Rempel, Oliver Hartkopp, Marc Kleine-Budde,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Maxime Jayat,
	kbuild test robot, netdev, linux-kernel

Hi,

On Mon, May 09, 2022 at 09:04:06PM +0200, Kurt Van Dijck wrote:
> On ma, 09 mei 2022 19:03:03 +0200, Devid Antonio Filoni wrote:
> > This is not explicitly stated in SAE J1939-21 and some tools used for
> > ISO-11783 certification do not expect this wait.

It will be interesting to know which certification tool do not expect it and
what explanation is used if it fails?

> IMHO, the current behaviour is not explicitely stated, but nor is the opposite.
> And if I'm not mistaken, this introduces a 250msec delay.
> 
> 1. If you want to avoid the 250msec gap, you should avoid to contest the same address.
> 
> 2. It's a balance between predictability and flexibility, but if you try to accomplish both,
> as your patch suggests, there is slight time-window until the current owner responds,
> in which it may be confusing which node has the address. It depends on how much history
> you have collected on the bus.
> 
> I'm sure that this problem decreases with increasing processing power on the nodes,
> but bigger internal queues also increase this window.
> 
> It would certainly help if you describe how the current implementation fails.
> 
> Would decreasing the dead time to 50msec help in such case.
> 
> Kind regards,
> Kurt
> 

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RESEND] can: j1939: do not wait 250ms if the same addr was already claimed
  2022-05-10  4:26   ` Oleksij Rempel
@ 2022-05-10 11:00     ` Devid Antonio Filoni
  2022-05-11  8:47       ` Oleksij Rempel
  0 siblings, 1 reply; 28+ messages in thread
From: Devid Antonio Filoni @ 2022-05-10 11:00 UTC (permalink / raw)
  To: Oleksij Rempel, Kurt Van Dijck
  Cc: Robin van der Gracht, kernel, linux-can, Oleksij Rempel,
	Oliver Hartkopp, Marc Kleine-Budde, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Maxime Jayat, kbuild test robot,
	netdev, linux-kernel

Hi,

On Tue, 2022-05-10 at 06:26 +0200, Oleksij Rempel wrote:
> Hi,
> 
> On Mon, May 09, 2022 at 09:04:06PM +0200, Kurt Van Dijck wrote:
> > On ma, 09 mei 2022 19:03:03 +0200, Devid Antonio Filoni wrote:
> > > This is not explicitly stated in SAE J1939-21 and some tools used for
> > > ISO-11783 certification do not expect this wait.
> 
> It will be interesting to know which certification tool do not expect it and
> what explanation is used if it fails?
> 
> > IMHO, the current behaviour is not explicitely stated, but nor is the opposite.
> > And if I'm not mistaken, this introduces a 250msec delay.
> > 
> > 1. If you want to avoid the 250msec gap, you should avoid to contest the same address.
> > 
> > 2. It's a balance between predictability and flexibility, but if you try to accomplish both,
> > as your patch suggests, there is slight time-window until the current owner responds,
> > in which it may be confusing which node has the address. It depends on how much history
> > you have collected on the bus.
> > 
> > I'm sure that this problem decreases with increasing processing power on the nodes,
> > but bigger internal queues also increase this window.
> > 
> > It would certainly help if you describe how the current implementation fails.
> > 
> > Would decreasing the dead time to 50msec help in such case.
> > 
> > Kind regards,
> > Kurt
> > 
> 

The test that is being executed during the ISOBUS compliance is the
following: after an address has been claimed by a CF (#1), another CF
(#2) sends a  message (other than address-claim) using the same address
claimed by CF #1.

As per ISO11783-5 standard, if a CF receives a message, other than the
address-claimed message, which uses the CF's own SA, then the CF (#1):
- shall send the address-claim message to the Global address;
- shall activate a diagnostic trouble code with SPN = 2000+SA and FMI =
31

After the address-claim message is sent by CF #1, as per ISO11783-5
standard:
- If the name of the CF #1 has a lower priority then the one of the CF
#2, the the CF #2 shall send its address-claim message and thus the CF
#1 shall send the cannot-claim-address message or shall execute again
the claim procedure with a new address
- If the name of the CF #1 has higher priority then the of the CF #2,
then the CF #2 shall send the cannot-claim-address message or shall
execute the claim procedure with a new address

Above conflict management is OK with current J1939 driver
implementation, however, since the driver always waits 250ms after
sending an address-claim message, the CF #1 cannot set the DTC. The DM1
message which is expected to be sent each second (as per J1939-73
standard) may not be sent.

Honestly, I don't know which company is doing the ISOBUS compliance
tests on our products and which tool they use as it was choosen by our
customer, however they did send us some CAN traces of previously
performed tests and we noticed that the DM1 message is sent 160ms after
the address-claim message (but it may also be lower then that), and this
is something that we cannot do because the driver blocks the application
from sending it.

28401.127146 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
with other CF's address
28401.167414 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
Claim - SA = F0
28401.349214 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 01 FF FF  //DM1
28402.155774 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
with other CF's address
28402.169455 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
Claim - SA = F0
28402.348226 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 02 FF FF  //DM1
28403.182753 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
with other CF's address
28403.188648 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
Claim - SA = F0
28403.349328 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
28404.349406 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
28405.349740 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1

Since the 250ms wait is not explicitly stated, IMHO it should be up to
the user-space implementation to decide how to manage it.

Thank you,
Devid


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RESEND] can: j1939: do not wait 250ms if the same addr was already claimed
  2022-05-10 11:00     ` Devid Antonio Filoni
@ 2022-05-11  8:47       ` Oleksij Rempel
  2022-05-11  9:06         ` David Jander
  0 siblings, 1 reply; 28+ messages in thread
From: Oleksij Rempel @ 2022-05-11  8:47 UTC (permalink / raw)
  To: Devid Antonio Filoni
  Cc: Kurt Van Dijck, Robin van der Gracht, kernel, linux-can,
	Oleksij Rempel, Oliver Hartkopp, Marc Kleine-Budde,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Maxime Jayat,
	kbuild test robot, netdev, linux-kernel, David Jander

Hi,

i'll CC more J1939 users to the discussion.

On Tue, May 10, 2022 at 01:00:41PM +0200, Devid Antonio Filoni wrote:
> Hi,
> 
> On Tue, 2022-05-10 at 06:26 +0200, Oleksij Rempel wrote:
> > Hi,
> > 
> > On Mon, May 09, 2022 at 09:04:06PM +0200, Kurt Van Dijck wrote:
> > > On ma, 09 mei 2022 19:03:03 +0200, Devid Antonio Filoni wrote:
> > > > This is not explicitly stated in SAE J1939-21 and some tools used for
> > > > ISO-11783 certification do not expect this wait.
> > 
> > It will be interesting to know which certification tool do not expect it and
> > what explanation is used if it fails?
> > 
> > > IMHO, the current behaviour is not explicitely stated, but nor is the opposite.
> > > And if I'm not mistaken, this introduces a 250msec delay.
> > > 
> > > 1. If you want to avoid the 250msec gap, you should avoid to contest the same address.
> > > 
> > > 2. It's a balance between predictability and flexibility, but if you try to accomplish both,
> > > as your patch suggests, there is slight time-window until the current owner responds,
> > > in which it may be confusing which node has the address. It depends on how much history
> > > you have collected on the bus.
> > > 
> > > I'm sure that this problem decreases with increasing processing power on the nodes,
> > > but bigger internal queues also increase this window.
> > > 
> > > It would certainly help if you describe how the current implementation fails.
> > > 
> > > Would decreasing the dead time to 50msec help in such case.
> > > 
> > > Kind regards,
> > > Kurt
> > > 
> > 
> 
> The test that is being executed during the ISOBUS compliance is the
> following: after an address has been claimed by a CF (#1), another CF
> (#2) sends a  message (other than address-claim) using the same address
> claimed by CF #1.
> 
> As per ISO11783-5 standard, if a CF receives a message, other than the
> address-claimed message, which uses the CF's own SA, then the CF (#1):
> - shall send the address-claim message to the Global address;
> - shall activate a diagnostic trouble code with SPN = 2000+SA and FMI =
> 31
> 
> After the address-claim message is sent by CF #1, as per ISO11783-5
> standard:
> - If the name of the CF #1 has a lower priority then the one of the CF
> #2, the the CF #2 shall send its address-claim message and thus the CF
> #1 shall send the cannot-claim-address message or shall execute again
> the claim procedure with a new address
> - If the name of the CF #1 has higher priority then the of the CF #2,
> then the CF #2 shall send the cannot-claim-address message or shall
> execute the claim procedure with a new address
> 
> Above conflict management is OK with current J1939 driver
> implementation, however, since the driver always waits 250ms after
> sending an address-claim message, the CF #1 cannot set the DTC. The DM1
> message which is expected to be sent each second (as per J1939-73
> standard) may not be sent.
> 
> Honestly, I don't know which company is doing the ISOBUS compliance
> tests on our products and which tool they use as it was choosen by our
> customer, however they did send us some CAN traces of previously
> performed tests and we noticed that the DM1 message is sent 160ms after
> the address-claim message (but it may also be lower then that), and this
> is something that we cannot do because the driver blocks the application
> from sending it.
> 
> 28401.127146 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> with other CF's address
> 28401.167414 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> Claim - SA = F0
> 28401.349214 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 01 FF FF  //DM1
> 28402.155774 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> with other CF's address
> 28402.169455 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> Claim - SA = F0
> 28402.348226 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 02 FF FF  //DM1
> 28403.182753 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> with other CF's address
> 28403.188648 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> Claim - SA = F0
> 28403.349328 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> 28404.349406 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> 28405.349740 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> 
> Since the 250ms wait is not explicitly stated, IMHO it should be up to
> the user-space implementation to decide how to manage it.
> 
> Thank you,
> Devid
> 
> 

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RESEND] can: j1939: do not wait 250ms if the same addr was already claimed
  2022-05-11  8:47       ` Oleksij Rempel
@ 2022-05-11  9:06         ` David Jander
  2022-05-11 12:55           ` Devid Antonio Filoni
  0 siblings, 1 reply; 28+ messages in thread
From: David Jander @ 2022-05-11  9:06 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Devid Antonio Filoni, Kurt Van Dijck, Robin van der Gracht,
	kernel, linux-can, Oleksij Rempel, Oliver Hartkopp,
	Marc Kleine-Budde, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Maxime Jayat, kbuild test robot, netdev, linux-kernel


Hi,

On Wed, 11 May 2022 10:47:28 +0200
Oleksij Rempel <o.rempel@pengutronix.de> wrote:

> Hi,
> 
> i'll CC more J1939 users to the discussion.

Thanks for the CC.

> On Tue, May 10, 2022 at 01:00:41PM +0200, Devid Antonio Filoni wrote:
> > Hi,
> > 
> > On Tue, 2022-05-10 at 06:26 +0200, Oleksij Rempel wrote:  
> > > Hi,
> > > 
> > > On Mon, May 09, 2022 at 09:04:06PM +0200, Kurt Van Dijck wrote:  
> > > > On ma, 09 mei 2022 19:03:03 +0200, Devid Antonio Filoni wrote:  
> > > > > This is not explicitly stated in SAE J1939-21 and some tools used for
> > > > > ISO-11783 certification do not expect this wait.  
> > > 
> > > It will be interesting to know which certification tool do not expect it and
> > > what explanation is used if it fails?
> > >   
> > > > IMHO, the current behaviour is not explicitely stated, but nor is the opposite.
> > > > And if I'm not mistaken, this introduces a 250msec delay.
> > > > 
> > > > 1. If you want to avoid the 250msec gap, you should avoid to contest the same address.
> > > > 
> > > > 2. It's a balance between predictability and flexibility, but if you try to accomplish both,
> > > > as your patch suggests, there is slight time-window until the current owner responds,
> > > > in which it may be confusing which node has the address. It depends on how much history
> > > > you have collected on the bus.
> > > > 
> > > > I'm sure that this problem decreases with increasing processing power on the nodes,
> > > > but bigger internal queues also increase this window.
> > > > 
> > > > It would certainly help if you describe how the current implementation fails.
> > > > 
> > > > Would decreasing the dead time to 50msec help in such case.
> > > > 
> > > > Kind regards,
> > > > Kurt
> > > >   
> > >   
> > 
> > The test that is being executed during the ISOBUS compliance is the
> > following: after an address has been claimed by a CF (#1), another CF
> > (#2) sends a  message (other than address-claim) using the same address
> > claimed by CF #1.
> > 
> > As per ISO11783-5 standard, if a CF receives a message, other than the
> > address-claimed message, which uses the CF's own SA, then the CF (#1):
> > - shall send the address-claim message to the Global address;
> > - shall activate a diagnostic trouble code with SPN = 2000+SA and FMI =
> > 31
> > 
> > After the address-claim message is sent by CF #1, as per ISO11783-5
> > standard:
> > - If the name of the CF #1 has a lower priority then the one of the CF
> > #2, the the CF #2 shall send its address-claim message and thus the CF
> > #1 shall send the cannot-claim-address message or shall execute again
> > the claim procedure with a new address
> > - If the name of the CF #1 has higher priority then the of the CF #2,
> > then the CF #2 shall send the cannot-claim-address message or shall
> > execute the claim procedure with a new address
> > 
> > Above conflict management is OK with current J1939 driver
> > implementation, however, since the driver always waits 250ms after
> > sending an address-claim message, the CF #1 cannot set the DTC. The DM1
> > message which is expected to be sent each second (as per J1939-73
> > standard) may not be sent.
> > 
> > Honestly, I don't know which company is doing the ISOBUS compliance
> > tests on our products and which tool they use as it was choosen by our
> > customer, however they did send us some CAN traces of previously
> > performed tests and we noticed that the DM1 message is sent 160ms after
> > the address-claim message (but it may also be lower then that), and this
> > is something that we cannot do because the driver blocks the application
> > from sending it.
> > 
> > 28401.127146 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > with other CF's address
> > 28401.167414 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > Claim - SA = F0
> > 28401.349214 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 01 FF FF  //DM1
> > 28402.155774 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > with other CF's address
> > 28402.169455 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > Claim - SA = F0
> > 28402.348226 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 02 FF FF  //DM1
> > 28403.182753 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > with other CF's address
> > 28403.188648 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > Claim - SA = F0
> > 28403.349328 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > 28404.349406 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > 28405.349740 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > 
> > Since the 250ms wait is not explicitly stated, IMHO it should be up to
> > the user-space implementation to decide how to manage it.

I think this is not entirely correct. AFAICS the 250ms wait is indeed
explicitly stated.
The following is taken from ISO 11783-5:

In "4.4.4.3 Address violation" it states that "If a CF receives a message,
other than the address-claimed message, which uses the CF’s own SA, then the
CF [...] shall send the address-claim message to the Global address."

So the CF shall claim its address again. But further down, in "4.5.2 Address
claim requirements" it is stated that "...No CF shall begin, or resume,
transmission on the network until 250 ms after it has successfully claimed an
address".

At this moment, the address is in dispute. The affected CFs are not allowed to
send any other messages until this dispute is resolved, and the standard
requires a waiting time of 250ms which is minimally deemed necessary to give
all participants time to respond and eventually dispute the address claim.

If the offending CF ignores this dispute and keeps sending incorrect messages
faster than every 250ms, then effectively the other CF has no chance to ever
resume normal operation because its address is still disputed.

According to 4.4.4.3 it is also required to set a DTC, but it will not be
allowed to send the DM1 message unless the address dispute is resolved.

This effectively leads to the offending CF to DoS the affected CF if it keeps
sending offending messages. Unfortunately neither J1939 nor ISObus takes into
account adversarial behavior on the CAN network, so we cannot do anything
about this.

As for the ISObus compliance tool that is mentioned by Devid, IMHO this
compliance tool should be challenged and fixed, since it is broken.

The networking layer is prohibiting the DM1 message to be sent, and the
networking layer has precedence above all superior protocol layers, so the
diagnostics layer is not able to operate at this moment.

Best regards,

-- 
David Jander
Protonic Holland.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RESEND] can: j1939: do not wait 250ms if the same addr was already claimed
  2022-05-11  9:06         ` David Jander
@ 2022-05-11 12:55           ` Devid Antonio Filoni
  2022-05-11 14:22             ` David Jander
  0 siblings, 1 reply; 28+ messages in thread
From: Devid Antonio Filoni @ 2022-05-11 12:55 UTC (permalink / raw)
  To: David Jander
  Cc: Oleksij Rempel, Kurt Van Dijck, Robin van der Gracht, kernel,
	linux-can, Oleksij Rempel, Oliver Hartkopp, Marc Kleine-Budde,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Maxime Jayat,
	kbuild test robot, netdev, linux-kernel

On Wed, 2022-05-11 at 11:06 +0200, David Jander wrote:
> Hi,
> 
> On Wed, 11 May 2022 10:47:28 +0200
> Oleksij Rempel <
> o.rempel@pengutronix.de
> > wrote:
> 
> > Hi,
> > 
> > i'll CC more J1939 users to the discussion.
> 
> Thanks for the CC.
> 
> > On Tue, May 10, 2022 at 01:00:41PM +0200, Devid Antonio Filoni wrote:
> > > Hi,
> > > 
> > > On Tue, 2022-05-10 at 06:26 +0200, Oleksij Rempel wrote:  
> > > > Hi,
> > > > 
> > > > On Mon, May 09, 2022 at 09:04:06PM +0200, Kurt Van Dijck wrote:  
> > > > > On ma, 09 mei 2022 19:03:03 +0200, Devid Antonio Filoni wrote:  
> > > > > > This is not explicitly stated in SAE J1939-21 and some tools used for
> > > > > > ISO-11783 certification do not expect this wait.  
> > > > 
> > > > It will be interesting to know which certification tool do not expect it and
> > > > what explanation is used if it fails?
> > > >   
> > > > > IMHO, the current behaviour is not explicitely stated, but nor is the opposite.
> > > > > And if I'm not mistaken, this introduces a 250msec delay.
> > > > > 
> > > > > 1. If you want to avoid the 250msec gap, you should avoid to contest the same address.
> > > > > 
> > > > > 2. It's a balance between predictability and flexibility, but if you try to accomplish both,
> > > > > as your patch suggests, there is slight time-window until the current owner responds,
> > > > > in which it may be confusing which node has the address. It depends on how much history
> > > > > you have collected on the bus.
> > > > > 
> > > > > I'm sure that this problem decreases with increasing processing power on the nodes,
> > > > > but bigger internal queues also increase this window.
> > > > > 
> > > > > It would certainly help if you describe how the current implementation fails.
> > > > > 
> > > > > Would decreasing the dead time to 50msec help in such case.
> > > > > 
> > > > > Kind regards,
> > > > > Kurt
> > > > >   
> > > > 
> > > >   
> > > 
> > > The test that is being executed during the ISOBUS compliance is the
> > > following: after an address has been claimed by a CF (#1), another CF
> > > (#2) sends a  message (other than address-claim) using the same address
> > > claimed by CF #1.
> > > 
> > > As per ISO11783-5 standard, if a CF receives a message, other than the
> > > address-claimed message, which uses the CF's own SA, then the CF (#1):
> > > - shall send the address-claim message to the Global address;
> > > - shall activate a diagnostic trouble code with SPN = 2000+SA and FMI =
> > > 31
> > > 
> > > After the address-claim message is sent by CF #1, as per ISO11783-5
> > > standard:
> > > - If the name of the CF #1 has a lower priority then the one of the CF
> > > #2, the the CF #2 shall send its address-claim message and thus the CF
> > > #1 shall send the cannot-claim-address message or shall execute again
> > > the claim procedure with a new address
> > > - If the name of the CF #1 has higher priority then the of the CF #2,
> > > then the CF #2 shall send the cannot-claim-address message or shall
> > > execute the claim procedure with a new address
> > > 
> > > Above conflict management is OK with current J1939 driver
> > > implementation, however, since the driver always waits 250ms after
> > > sending an address-claim message, the CF #1 cannot set the DTC. The DM1
> > > message which is expected to be sent each second (as per J1939-73
> > > standard) may not be sent.
> > > 
> > > Honestly, I don't know which company is doing the ISOBUS compliance
> > > tests on our products and which tool they use as it was choosen by our
> > > customer, however they did send us some CAN traces of previously
> > > performed tests and we noticed that the DM1 message is sent 160ms after
> > > the address-claim message (but it may also be lower then that), and this
> > > is something that we cannot do because the driver blocks the application
> > > from sending it.
> > > 
> > > 28401.127146 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > with other CF's address
> > > 28401.167414 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > Claim - SA = F0
> > > 28401.349214 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 01 FF FF  //DM1
> > > 28402.155774 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > with other CF's address
> > > 28402.169455 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > Claim - SA = F0
> > > 28402.348226 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 02 FF FF  //DM1
> > > 28403.182753 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > with other CF's address
> > > 28403.188648 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > Claim - SA = F0
> > > 28403.349328 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > 28404.349406 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > 28405.349740 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > 
> > > Since the 250ms wait is not explicitly stated, IMHO it should be up to
> > > the user-space implementation to decide how to manage it.
> 
> I think this is not entirely correct. AFAICS the 250ms wait is indeed
> explicitly stated.
> The following is taken from ISO 11783-5:
> 
> In "4.4.4.3 Address violation" it states that "If a CF receives a message,
> other than the address-claimed message, which uses the CF’s own SA, then the
> CF [...] shall send the address-claim message to the Global address."
> 
> So the CF shall claim its address again. But further down, in "4.5.2 Address
> claim requirements" it is stated that "...No CF shall begin, or resume,
> transmission on the network until 250 ms after it has successfully claimed an
> address".
> 
> At this moment, the address is in dispute. The affected CFs are not allowed to
> send any other messages until this dispute is resolved, and the standard
> requires a waiting time of 250ms which is minimally deemed necessary to give
> all participants time to respond and eventually dispute the address claim.
> 
> If the offending CF ignores this dispute and keeps sending incorrect messages
> faster than every 250ms, then effectively the other CF has no chance to ever
> resume normal operation because its address is still disputed.
> 
> According to 4.4.4.3 it is also required to set a DTC, but it will not be
> allowed to send the DM1 message unless the address dispute is resolved.
> 
> This effectively leads to the offending CF to DoS the affected CF if it keeps
> sending offending messages. Unfortunately neither J1939 nor ISObus takes into
> account adversarial behavior on the CAN network, so we cannot do anything
> about this.
> 
> As for the ISObus compliance tool that is mentioned by Devid, IMHO this
> compliance tool should be challenged and fixed, since it is broken.
> 
> The networking layer is prohibiting the DM1 message to be sent, and the
> networking layer has precedence above all superior protocol layers, so the
> diagnostics layer is not able to operate at this moment.
> 
> Best regards,
> 
> 

Hi David,

I get your point but I'm not sure that it is the correct interpretation
that should be applied in this particular case for the following
reasons:

- In "4.5.2 Address claim requirements" it is explicitly stated that
"The CF shall claim its own address when initializing and when
responding to a command to change its NAME or address" and this seems to
completely ignore the "4.4.4.3 Address violation" that states that the
address-claimed message shall be sent also when "the CF receives a
message, other than the address-claimed message, which uses the CF's own
SA".
Please note that the address was already claimed by the CF, so I think
that the initialization requirements should not apply in this case since
all disputes were already resolved.

- If the offending CF ignores the dispute, as you said, then the other
CF has no chance to ever resume normal operation and so the network
cannot be aware that the other CF is not working correctly because the
offending CF is spoofing its own address. This seems to make useless the
requirement that states to activate the DTC in "4.4.4.3 Address
violation".

Regards,
Devid


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RESEND] can: j1939: do not wait 250ms if the same addr was already claimed
  2022-05-11 12:55           ` Devid Antonio Filoni
@ 2022-05-11 14:22             ` David Jander
  2022-05-13  9:46               ` Devid Antonio Filoni
  0 siblings, 1 reply; 28+ messages in thread
From: David Jander @ 2022-05-11 14:22 UTC (permalink / raw)
  To: Devid Antonio Filoni
  Cc: Oleksij Rempel, Kurt Van Dijck, Robin van der Gracht, kernel,
	linux-can, Oleksij Rempel, Oliver Hartkopp, Marc Kleine-Budde,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Maxime Jayat,
	kbuild test robot, netdev, linux-kernel


Hi Devid,

On Wed, 11 May 2022 14:55:04 +0200
Devid Antonio Filoni <devid.filoni@egluetechnologies.com> wrote:

> On Wed, 2022-05-11 at 11:06 +0200, David Jander wrote:
> > Hi,
> > 
> > On Wed, 11 May 2022 10:47:28 +0200
> > Oleksij Rempel <
> > o.rempel@pengutronix.de  
> > > wrote:  
> >   
> > > Hi,
> > > 
> > > i'll CC more J1939 users to the discussion.  
> > 
> > Thanks for the CC.
> >   
> > > On Tue, May 10, 2022 at 01:00:41PM +0200, Devid Antonio Filoni wrote:  
> > > > Hi,
> > > > 
> > > > On Tue, 2022-05-10 at 06:26 +0200, Oleksij Rempel wrote:    
> > > > > Hi,
> > > > > 
> > > > > On Mon, May 09, 2022 at 09:04:06PM +0200, Kurt Van Dijck wrote:    
> > > > > > On ma, 09 mei 2022 19:03:03 +0200, Devid Antonio Filoni wrote:    
> > > > > > > This is not explicitly stated in SAE J1939-21 and some tools used for
> > > > > > > ISO-11783 certification do not expect this wait.    
> > > > > 
> > > > > It will be interesting to know which certification tool do not expect it and
> > > > > what explanation is used if it fails?
> > > > >     
> > > > > > IMHO, the current behaviour is not explicitely stated, but nor is the opposite.
> > > > > > And if I'm not mistaken, this introduces a 250msec delay.
> > > > > > 
> > > > > > 1. If you want to avoid the 250msec gap, you should avoid to contest the same address.
> > > > > > 
> > > > > > 2. It's a balance between predictability and flexibility, but if you try to accomplish both,
> > > > > > as your patch suggests, there is slight time-window until the current owner responds,
> > > > > > in which it may be confusing which node has the address. It depends on how much history
> > > > > > you have collected on the bus.
> > > > > > 
> > > > > > I'm sure that this problem decreases with increasing processing power on the nodes,
> > > > > > but bigger internal queues also increase this window.
> > > > > > 
> > > > > > It would certainly help if you describe how the current implementation fails.
> > > > > > 
> > > > > > Would decreasing the dead time to 50msec help in such case.
> > > > > > 
> > > > > > Kind regards,
> > > > > > Kurt
> > > > > >     
> > > > > 
> > > > >     
> > > > 
> > > > The test that is being executed during the ISOBUS compliance is the
> > > > following: after an address has been claimed by a CF (#1), another CF
> > > > (#2) sends a  message (other than address-claim) using the same address
> > > > claimed by CF #1.
> > > > 
> > > > As per ISO11783-5 standard, if a CF receives a message, other than the
> > > > address-claimed message, which uses the CF's own SA, then the CF (#1):
> > > > - shall send the address-claim message to the Global address;
> > > > - shall activate a diagnostic trouble code with SPN = 2000+SA and FMI =
> > > > 31
> > > > 
> > > > After the address-claim message is sent by CF #1, as per ISO11783-5
> > > > standard:
> > > > - If the name of the CF #1 has a lower priority then the one of the CF
> > > > #2, the the CF #2 shall send its address-claim message and thus the CF
> > > > #1 shall send the cannot-claim-address message or shall execute again
> > > > the claim procedure with a new address
> > > > - If the name of the CF #1 has higher priority then the of the CF #2,
> > > > then the CF #2 shall send the cannot-claim-address message or shall
> > > > execute the claim procedure with a new address
> > > > 
> > > > Above conflict management is OK with current J1939 driver
> > > > implementation, however, since the driver always waits 250ms after
> > > > sending an address-claim message, the CF #1 cannot set the DTC. The DM1
> > > > message which is expected to be sent each second (as per J1939-73
> > > > standard) may not be sent.
> > > > 
> > > > Honestly, I don't know which company is doing the ISOBUS compliance
> > > > tests on our products and which tool they use as it was choosen by our
> > > > customer, however they did send us some CAN traces of previously
> > > > performed tests and we noticed that the DM1 message is sent 160ms after
> > > > the address-claim message (but it may also be lower then that), and this
> > > > is something that we cannot do because the driver blocks the application
> > > > from sending it.
> > > > 
> > > > 28401.127146 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > with other CF's address
> > > > 28401.167414 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > Claim - SA = F0
> > > > 28401.349214 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 01 FF FF  //DM1
> > > > 28402.155774 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > with other CF's address
> > > > 28402.169455 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > Claim - SA = F0
> > > > 28402.348226 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 02 FF FF  //DM1
> > > > 28403.182753 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > with other CF's address
> > > > 28403.188648 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > Claim - SA = F0
> > > > 28403.349328 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > 28404.349406 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > 28405.349740 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > 
> > > > Since the 250ms wait is not explicitly stated, IMHO it should be up to
> > > > the user-space implementation to decide how to manage it.  
> > 
> > I think this is not entirely correct. AFAICS the 250ms wait is indeed
> > explicitly stated.
> > The following is taken from ISO 11783-5:
> > 
> > In "4.4.4.3 Address violation" it states that "If a CF receives a message,
> > other than the address-claimed message, which uses the CF’s own SA, then the
> > CF [...] shall send the address-claim message to the Global address."
> > 
> > So the CF shall claim its address again. But further down, in "4.5.2 Address
> > claim requirements" it is stated that "...No CF shall begin, or resume,
> > transmission on the network until 250 ms after it has successfully claimed an
> > address".
> > 
> > At this moment, the address is in dispute. The affected CFs are not allowed to
> > send any other messages until this dispute is resolved, and the standard
> > requires a waiting time of 250ms which is minimally deemed necessary to give
> > all participants time to respond and eventually dispute the address claim.
> > 
> > If the offending CF ignores this dispute and keeps sending incorrect messages
> > faster than every 250ms, then effectively the other CF has no chance to ever
> > resume normal operation because its address is still disputed.
> > 
> > According to 4.4.4.3 it is also required to set a DTC, but it will not be
> > allowed to send the DM1 message unless the address dispute is resolved.
> > 
> > This effectively leads to the offending CF to DoS the affected CF if it keeps
> > sending offending messages. Unfortunately neither J1939 nor ISObus takes into
> > account adversarial behavior on the CAN network, so we cannot do anything
> > about this.
> > 
> > As for the ISObus compliance tool that is mentioned by Devid, IMHO this
> > compliance tool should be challenged and fixed, since it is broken.
> > 
> > The networking layer is prohibiting the DM1 message to be sent, and the
> > networking layer has precedence above all superior protocol layers, so the
> > diagnostics layer is not able to operate at this moment.
> > 
> > Best regards,
> > 
> >   
> 
> Hi David,
> 
> I get your point but I'm not sure that it is the correct interpretation
> that should be applied in this particular case for the following
> reasons:
> 
> - In "4.5.2 Address claim requirements" it is explicitly stated that
> "The CF shall claim its own address when initializing and when
> responding to a command to change its NAME or address" and this seems to

The standard unfortunately has a track record of ignoring a lot of scenarios
and corner cases, like in this instance the fact that there can appear new
participants on the bus _after_ initialization has long finished, and it would
need to claim its address again in that case.

But look at point d) of that same section: "No CF shall begin, or resume,
transmission on the network until 250 ms after it has successfully claimed an
address (Figure 4). This does not apply when responding to a request for
address claimed."

So we basically have two situations when this will apply after the network is
up and running and a new node suddenly appears:

 1. The new node starts with a "Request for address claimed" message, to
 which your CF should respond with an "Address Claimed" message and NOT wait
 250ms.

or

 2. The new node creates an addressing conflict either by claiming its address
 without first sending a "request for address claimed" message or (and this is
 your case) simply using its address without claiming it first.

It is this second possibility where there is a conflict that must be resolved,
and then you must wait 250ms after claiming the conflicting address for
yourself.

> completely ignore the "4.4.4.3 Address violation" that states that the
> address-claimed message shall be sent also when "the CF receives a
> message, other than the address-claimed message, which uses the CF's own
> SA".
> Please note that the address was already claimed by the CF, so I think
> that the initialization requirements should not apply in this case since
> all disputes were already resolved.

Well, yes and no. The address was claimed before, yes, but then a new node came
onto the bus and disputed that address. In that case the dispute needs to be
resolved first. Imagine you would NOT wait 250ms, but the other CF did
correctly claim its address, but it was you who did not receive that message
for some reason. Now also assume that your own NAME has a lower priority than
the other CF. In this case you can send a "claimed address" message to claim
your address again, but it will be contested. If you don't wait for the
contestant, it is you who will be in violation of the protocol, because you
should have changed your own address but failed to do so.

> - If the offending CF ignores the dispute, as you said, then the other
> CF has no chance to ever resume normal operation and so the network
> cannot be aware that the other CF is not working correctly because the
> offending CF is spoofing its own address.

Correct. And like I said in my previous reply, this is unfortunately how CAN,
J1939 and ISObus work. The whole network must cooperate and there is no
consideration for malign or adversarial actors.
There are also a lot of possible corner cases that these standards
unfortunately do not take into account. Conformance test tools seem to be even
more problematic and tend to have bugs quite often. I am still inclined to
think this is the case with your test tool.

> This seems to make useless the
> requirement that states to activate the DTC in "4.4.4.3 Address
> violation".

The requirement is not useless. You can still set and store the DTC, just not
broadcast it to the network at that moment.

Best regards,

-- 
David Jander
Protonic Holland.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RESEND] can: j1939: do not wait 250ms if the same addr was already claimed
  2022-05-11 14:22             ` David Jander
@ 2022-05-13  9:46               ` Devid Antonio Filoni
  2022-11-17 14:08                 ` Devid Antonio Filoni
  0 siblings, 1 reply; 28+ messages in thread
From: Devid Antonio Filoni @ 2022-05-13  9:46 UTC (permalink / raw)
  To: David Jander
  Cc: Oleksij Rempel, Kurt Van Dijck, Robin van der Gracht, kernel,
	linux-can, Oleksij Rempel, Oliver Hartkopp, Marc Kleine-Budde,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Maxime Jayat,
	kbuild test robot, netdev, linux-kernel

Hi David,

On Wed, 2022-05-11 at 16:22 +0200, David Jander wrote:
> Hi Devid,
> 
> On Wed, 11 May 2022 14:55:04 +0200
> Devid Antonio Filoni <
> devid.filoni@egluetechnologies.com
> > wrote:
> 
> > On Wed, 2022-05-11 at 11:06 +0200, David Jander wrote:
> > > Hi,
> > > 
> > > On Wed, 11 May 2022 10:47:28 +0200
> > > Oleksij Rempel <
> > > o.rempel@pengutronix.de
> > >   
> > > > wrote:  
> > > 
> > >   
> > > > Hi,
> > > > 
> > > > i'll CC more J1939 users to the discussion.  
> > > 
> > > Thanks for the CC.
> > >   
> > > > On Tue, May 10, 2022 at 01:00:41PM +0200, Devid Antonio Filoni wrote:  
> > > > > Hi,
> > > > > 
> > > > > On Tue, 2022-05-10 at 06:26 +0200, Oleksij Rempel wrote:    
> > > > > > Hi,
> > > > > > 
> > > > > > On Mon, May 09, 2022 at 09:04:06PM +0200, Kurt Van Dijck wrote:    
> > > > > > > On ma, 09 mei 2022 19:03:03 +0200, Devid Antonio Filoni wrote:    
> > > > > > > > This is not explicitly stated in SAE J1939-21 and some tools used for
> > > > > > > > ISO-11783 certification do not expect this wait.    
> > > > > > 
> > > > > > It will be interesting to know which certification tool do not expect it and
> > > > > > what explanation is used if it fails?
> > > > > >     
> > > > > > > IMHO, the current behaviour is not explicitely stated, but nor is the opposite.
> > > > > > > And if I'm not mistaken, this introduces a 250msec delay.
> > > > > > > 
> > > > > > > 1. If you want to avoid the 250msec gap, you should avoid to contest the same address.
> > > > > > > 
> > > > > > > 2. It's a balance between predictability and flexibility, but if you try to accomplish both,
> > > > > > > as your patch suggests, there is slight time-window until the current owner responds,
> > > > > > > in which it may be confusing which node has the address. It depends on how much history
> > > > > > > you have collected on the bus.
> > > > > > > 
> > > > > > > I'm sure that this problem decreases with increasing processing power on the nodes,
> > > > > > > but bigger internal queues also increase this window.
> > > > > > > 
> > > > > > > It would certainly help if you describe how the current implementation fails.
> > > > > > > 
> > > > > > > Would decreasing the dead time to 50msec help in such case.
> > > > > > > 
> > > > > > > Kind regards,
> > > > > > > Kurt
> > > > > > >     
> > > > > > 
> > > > > >     
> > > > > 
> > > > > The test that is being executed during the ISOBUS compliance is the
> > > > > following: after an address has been claimed by a CF (#1), another CF
> > > > > (#2) sends a  message (other than address-claim) using the same address
> > > > > claimed by CF #1.
> > > > > 
> > > > > As per ISO11783-5 standard, if a CF receives a message, other than the
> > > > > address-claimed message, which uses the CF's own SA, then the CF (#1):
> > > > > - shall send the address-claim message to the Global address;
> > > > > - shall activate a diagnostic trouble code with SPN = 2000+SA and FMI =
> > > > > 31
> > > > > 
> > > > > After the address-claim message is sent by CF #1, as per ISO11783-5
> > > > > standard:
> > > > > - If the name of the CF #1 has a lower priority then the one of the CF
> > > > > #2, the the CF #2 shall send its address-claim message and thus the CF
> > > > > #1 shall send the cannot-claim-address message or shall execute again
> > > > > the claim procedure with a new address
> > > > > - If the name of the CF #1 has higher priority then the of the CF #2,
> > > > > then the CF #2 shall send the cannot-claim-address message or shall
> > > > > execute the claim procedure with a new address
> > > > > 
> > > > > Above conflict management is OK with current J1939 driver
> > > > > implementation, however, since the driver always waits 250ms after
> > > > > sending an address-claim message, the CF #1 cannot set the DTC. The DM1
> > > > > message which is expected to be sent each second (as per J1939-73
> > > > > standard) may not be sent.
> > > > > 
> > > > > Honestly, I don't know which company is doing the ISOBUS compliance
> > > > > tests on our products and which tool they use as it was choosen by our
> > > > > customer, however they did send us some CAN traces of previously
> > > > > performed tests and we noticed that the DM1 message is sent 160ms after
> > > > > the address-claim message (but it may also be lower then that), and this
> > > > > is something that we cannot do because the driver blocks the application
> > > > > from sending it.
> > > > > 
> > > > > 28401.127146 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > with other CF's address
> > > > > 28401.167414 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > Claim - SA = F0
> > > > > 28401.349214 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 01 FF FF  //DM1
> > > > > 28402.155774 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > with other CF's address
> > > > > 28402.169455 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > Claim - SA = F0
> > > > > 28402.348226 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 02 FF FF  //DM1
> > > > > 28403.182753 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > with other CF's address
> > > > > 28403.188648 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > Claim - SA = F0
> > > > > 28403.349328 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > 28404.349406 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > 28405.349740 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > 
> > > > > Since the 250ms wait is not explicitly stated, IMHO it should be up to
> > > > > the user-space implementation to decide how to manage it.  
> > > 
> > > I think this is not entirely correct. AFAICS the 250ms wait is indeed
> > > explicitly stated.
> > > The following is taken from ISO 11783-5:
> > > 
> > > In "4.4.4.3 Address violation" it states that "If a CF receives a message,
> > > other than the address-claimed message, which uses the CF’s own SA, then the
> > > CF [...] shall send the address-claim message to the Global address."
> > > 
> > > So the CF shall claim its address again. But further down, in "4.5.2 Address
> > > claim requirements" it is stated that "...No CF shall begin, or resume,
> > > transmission on the network until 250 ms after it has successfully claimed an
> > > address".
> > > 
> > > At this moment, the address is in dispute. The affected CFs are not allowed to
> > > send any other messages until this dispute is resolved, and the standard
> > > requires a waiting time of 250ms which is minimally deemed necessary to give
> > > all participants time to respond and eventually dispute the address claim.
> > > 
> > > If the offending CF ignores this dispute and keeps sending incorrect messages
> > > faster than every 250ms, then effectively the other CF has no chance to ever
> > > resume normal operation because its address is still disputed.
> > > 
> > > According to 4.4.4.3 it is also required to set a DTC, but it will not be
> > > allowed to send the DM1 message unless the address dispute is resolved.
> > > 
> > > This effectively leads to the offending CF to DoS the affected CF if it keeps
> > > sending offending messages. Unfortunately neither J1939 nor ISObus takes into
> > > account adversarial behavior on the CAN network, so we cannot do anything
> > > about this.
> > > 
> > > As for the ISObus compliance tool that is mentioned by Devid, IMHO this
> > > compliance tool should be challenged and fixed, since it is broken.
> > > 
> > > The networking layer is prohibiting the DM1 message to be sent, and the
> > > networking layer has precedence above all superior protocol layers, so the
> > > diagnostics layer is not able to operate at this moment.
> > > 
> > > Best regards,
> > > 
> > >   
> > 
> > Hi David,
> > 
> > I get your point but I'm not sure that it is the correct interpretation
> > that should be applied in this particular case for the following
> > reasons:
> > 
> > - In "4.5.2 Address claim requirements" it is explicitly stated that
> > "The CF shall claim its own address when initializing and when
> > responding to a command to change its NAME or address" and this seems to
> 
> The standard unfortunately has a track record of ignoring a lot of scenarios
> and corner cases, like in this instance the fact that there can appear new
> participants on the bus _after_ initialization has long finished, and it would
> need to claim its address again in that case.
> 
> But look at point d) of that same section: "No CF shall begin, or resume,
> transmission on the network until 250 ms after it has successfully claimed an
> address (Figure 4). This does not apply when responding to a request for
> address claimed."
> 
> So we basically have two situations when this will apply after the network is
> up and running and a new node suddenly appears:
> 
>  1. The new node starts with a "Request for address claimed" message, to
>  which your CF should respond with an "Address Claimed" message and NOT wait
>  250ms.
> 
> or
> 
>  2. The new node creates an addressing conflict either by claiming its address
>  without first sending a "request for address claimed" message or (and this is
>  your case) simply using its address without claiming it first.
> 
> It is this second possibility where there is a conflict that must be resolved,
> and then you must wait 250ms after claiming the conflicting address for
> yourself.
> 
> > completely ignore the "4.4.4.3 Address violation" that states that the
> > address-claimed message shall be sent also when "the CF receives a
> > message, other than the address-claimed message, which uses the CF's own
> > SA".
> > Please note that the address was already claimed by the CF, so I think
> > that the initialization requirements should not apply in this case since
> > all disputes were already resolved.
> 
> Well, yes and no. The address was claimed before, yes, but then a new node came
> onto the bus and disputed that address. In that case the dispute needs to be
> resolved first. Imagine you would NOT wait 250ms, but the other CF did
> correctly claim its address, but it was you who did not receive that message
> for some reason. Now also assume that your own NAME has a lower priority than
> the other CF. In this case you can send a "claimed address" message to claim
> your address again, but it will be contested. If you don't wait for the
> contestant, it is you who will be in violation of the protocol, because you
> should have changed your own address but failed to do so.
> 
> > - If the offending CF ignores the dispute, as you said, then the other
> > CF has no chance to ever resume normal operation and so the network
> > cannot be aware that the other CF is not working correctly because the
> > offending CF is spoofing its own address.
> 
> Correct. And like I said in my previous reply, this is unfortunately how CAN,
> J1939 and ISObus work. The whole network must cooperate and there is no
> consideration for malign or adversarial actors.
> There are also a lot of possible corner cases that these standards
> unfortunately do not take into account. Conformance test tools seem to be even
> more problematic and tend to have bugs quite often. I am still inclined to
> think this is the case with your test tool.
> 
> > This seems to make useless the
> > requirement that states to activate the DTC in "4.4.4.3 Address
> > violation".
> 
> The requirement is not useless. You can still set and store the DTC, just not
> broadcast it to the network at that moment.
> 
> Best regards,
> 
> 

Thank you for your feedback and explanation.
I asked the customer to contact the compliance company so that we can
verify with them this particular use-case. I want to understand if there
is an application note or exception that states how to manage it or if
they implemented the test basing it on their own interpretation and how
it really works: supposing that the test does not check the DM1
presence, then the test could be passed even without sending the DM1
message during the 250ms after the adress-claimed message.

Best regards,
Devid


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RESEND] can: j1939: do not wait 250ms if the same addr was already claimed
  2022-05-13  9:46               ` Devid Antonio Filoni
@ 2022-11-17 14:08                 ` Devid Antonio Filoni
  2022-11-17 15:22                   ` David Jander
  0 siblings, 1 reply; 28+ messages in thread
From: Devid Antonio Filoni @ 2022-11-17 14:08 UTC (permalink / raw)
  To: David Jander
  Cc: Oleksij Rempel, Kurt Van Dijck, Robin van der Gracht, kernel,
	linux-can, Oleksij Rempel, Oliver Hartkopp, Marc Kleine-Budde,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Maxime Jayat,
	kbuild test robot, netdev, linux-kernel

On Fri, 2022-05-13 at 11:46 +0200, Devid Antonio Filoni wrote:
> Hi David,
> 
> On Wed, 2022-05-11 at 16:22 +0200, David Jander wrote:
> > Hi Devid,
> > 
> > On Wed, 11 May 2022 14:55:04 +0200
> > Devid Antonio Filoni <
> > devid.filoni@egluetechnologies.com
> > > wrote:
> > 
> > > On Wed, 2022-05-11 at 11:06 +0200, David Jander wrote:
> > > > Hi,
> > > > 
> > > > On Wed, 11 May 2022 10:47:28 +0200
> > > > Oleksij Rempel <
> > > > o.rempel@pengutronix.de
> > > >   
> > > > > wrote:  
> > > > 
> > > >   
> > > > > Hi,
> > > > > 
> > > > > i'll CC more J1939 users to the discussion.  
> > > > 
> > > > Thanks for the CC.
> > > >   
> > > > > On Tue, May 10, 2022 at 01:00:41PM +0200, Devid Antonio Filoni wrote:  
> > > > > > Hi,
> > > > > > 
> > > > > > On Tue, 2022-05-10 at 06:26 +0200, Oleksij Rempel wrote:    
> > > > > > > Hi,
> > > > > > > 
> > > > > > > On Mon, May 09, 2022 at 09:04:06PM +0200, Kurt Van Dijck wrote:    
> > > > > > > > On ma, 09 mei 2022 19:03:03 +0200, Devid Antonio Filoni wrote:    
> > > > > > > > > This is not explicitly stated in SAE J1939-21 and some tools used for
> > > > > > > > > ISO-11783 certification do not expect this wait.    
> > > > > > > 
> > > > > > > It will be interesting to know which certification tool do not expect it and
> > > > > > > what explanation is used if it fails?
> > > > > > >     
> > > > > > > > IMHO, the current behaviour is not explicitely stated, but nor is the opposite.
> > > > > > > > And if I'm not mistaken, this introduces a 250msec delay.
> > > > > > > > 
> > > > > > > > 1. If you want to avoid the 250msec gap, you should avoid to contest the same address.
> > > > > > > > 
> > > > > > > > 2. It's a balance between predictability and flexibility, but if you try to accomplish both,
> > > > > > > > as your patch suggests, there is slight time-window until the current owner responds,
> > > > > > > > in which it may be confusing which node has the address. It depends on how much history
> > > > > > > > you have collected on the bus.
> > > > > > > > 
> > > > > > > > I'm sure that this problem decreases with increasing processing power on the nodes,
> > > > > > > > but bigger internal queues also increase this window.
> > > > > > > > 
> > > > > > > > It would certainly help if you describe how the current implementation fails.
> > > > > > > > 
> > > > > > > > Would decreasing the dead time to 50msec help in such case.
> > > > > > > > 
> > > > > > > > Kind regards,
> > > > > > > > Kurt
> > > > > > > >     
> > > > > > > 
> > > > > > >     
> > > > > > 
> > > > > > The test that is being executed during the ISOBUS compliance is the
> > > > > > following: after an address has been claimed by a CF (#1), another CF
> > > > > > (#2) sends a  message (other than address-claim) using the same address
> > > > > > claimed by CF #1.
> > > > > > 
> > > > > > As per ISO11783-5 standard, if a CF receives a message, other than the
> > > > > > address-claimed message, which uses the CF's own SA, then the CF (#1):
> > > > > > - shall send the address-claim message to the Global address;
> > > > > > - shall activate a diagnostic trouble code with SPN = 2000+SA and FMI =
> > > > > > 31
> > > > > > 
> > > > > > After the address-claim message is sent by CF #1, as per ISO11783-5
> > > > > > standard:
> > > > > > - If the name of the CF #1 has a lower priority then the one of the CF
> > > > > > #2, the the CF #2 shall send its address-claim message and thus the CF
> > > > > > #1 shall send the cannot-claim-address message or shall execute again
> > > > > > the claim procedure with a new address
> > > > > > - If the name of the CF #1 has higher priority then the of the CF #2,
> > > > > > then the CF #2 shall send the cannot-claim-address message or shall
> > > > > > execute the claim procedure with a new address
> > > > > > 
> > > > > > Above conflict management is OK with current J1939 driver
> > > > > > implementation, however, since the driver always waits 250ms after
> > > > > > sending an address-claim message, the CF #1 cannot set the DTC. The DM1
> > > > > > message which is expected to be sent each second (as per J1939-73
> > > > > > standard) may not be sent.
> > > > > > 
> > > > > > Honestly, I don't know which company is doing the ISOBUS compliance
> > > > > > tests on our products and which tool they use as it was choosen by our
> > > > > > customer, however they did send us some CAN traces of previously
> > > > > > performed tests and we noticed that the DM1 message is sent 160ms after
> > > > > > the address-claim message (but it may also be lower then that), and this
> > > > > > is something that we cannot do because the driver blocks the application
> > > > > > from sending it.
> > > > > > 
> > > > > > 28401.127146 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > > with other CF's address
> > > > > > 28401.167414 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > > Claim - SA = F0
> > > > > > 28401.349214 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 01 FF FF  //DM1
> > > > > > 28402.155774 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > > with other CF's address
> > > > > > 28402.169455 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > > Claim - SA = F0
> > > > > > 28402.348226 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 02 FF FF  //DM1
> > > > > > 28403.182753 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > > with other CF's address
> > > > > > 28403.188648 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > > Claim - SA = F0
> > > > > > 28403.349328 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > > 28404.349406 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > > 28405.349740 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > > 
> > > > > > Since the 250ms wait is not explicitly stated, IMHO it should be up to
> > > > > > the user-space implementation to decide how to manage it.  
> > > > 
> > > > I think this is not entirely correct. AFAICS the 250ms wait is indeed
> > > > explicitly stated.
> > > > The following is taken from ISO 11783-5:
> > > > 
> > > > In "4.4.4.3 Address violation" it states that "If a CF receives a message,
> > > > other than the address-claimed message, which uses the CF’s own SA, then the
> > > > CF [...] shall send the address-claim message to the Global address."
> > > > 
> > > > So the CF shall claim its address again. But further down, in "4.5.2 Address
> > > > claim requirements" it is stated that "...No CF shall begin, or resume,
> > > > transmission on the network until 250 ms after it has successfully claimed an
> > > > address".
> > > > 
> > > > At this moment, the address is in dispute. The affected CFs are not allowed to
> > > > send any other messages until this dispute is resolved, and the standard
> > > > requires a waiting time of 250ms which is minimally deemed necessary to give
> > > > all participants time to respond and eventually dispute the address claim.
> > > > 
> > > > If the offending CF ignores this dispute and keeps sending incorrect messages
> > > > faster than every 250ms, then effectively the other CF has no chance to ever
> > > > resume normal operation because its address is still disputed.
> > > > 
> > > > According to 4.4.4.3 it is also required to set a DTC, but it will not be
> > > > allowed to send the DM1 message unless the address dispute is resolved.
> > > > 
> > > > This effectively leads to the offending CF to DoS the affected CF if it keeps
> > > > sending offending messages. Unfortunately neither J1939 nor ISObus takes into
> > > > account adversarial behavior on the CAN network, so we cannot do anything
> > > > about this.
> > > > 
> > > > As for the ISObus compliance tool that is mentioned by Devid, IMHO this
> > > > compliance tool should be challenged and fixed, since it is broken.
> > > > 
> > > > The networking layer is prohibiting the DM1 message to be sent, and the
> > > > networking layer has precedence above all superior protocol layers, so the
> > > > diagnostics layer is not able to operate at this moment.
> > > > 
> > > > Best regards,
> > > > 
> > > >   
> > > 
> > > Hi David,
> > > 
> > > I get your point but I'm not sure that it is the correct interpretation
> > > that should be applied in this particular case for the following
> > > reasons:
> > > 
> > > - In "4.5.2 Address claim requirements" it is explicitly stated that
> > > "The CF shall claim its own address when initializing and when
> > > responding to a command to change its NAME or address" and this seems to
> > 
> > The standard unfortunately has a track record of ignoring a lot of scenarios
> > and corner cases, like in this instance the fact that there can appear new
> > participants on the bus _after_ initialization has long finished, and it would
> > need to claim its address again in that case.
> > 
> > But look at point d) of that same section: "No CF shall begin, or resume,
> > transmission on the network until 250 ms after it has successfully claimed an
> > address (Figure 4). This does not apply when responding to a request for
> > address claimed."
> > 
> > So we basically have two situations when this will apply after the network is
> > up and running and a new node suddenly appears:
> > 
> >  1. The new node starts with a "Request for address claimed" message, to
> >  which your CF should respond with an "Address Claimed" message and NOT wait
> >  250ms.
> > 
> > or
> > 
> >  2. The new node creates an addressing conflict either by claiming its address
> >  without first sending a "request for address claimed" message or (and this is
> >  your case) simply using its address without claiming it first.
> > 
> > It is this second possibility where there is a conflict that must be resolved,
> > and then you must wait 250ms after claiming the conflicting address for
> > yourself.
> > 
> > > completely ignore the "4.4.4.3 Address violation" that states that the
> > > address-claimed message shall be sent also when "the CF receives a
> > > message, other than the address-claimed message, which uses the CF's own
> > > SA".
> > > Please note that the address was already claimed by the CF, so I think
> > > that the initialization requirements should not apply in this case since
> > > all disputes were already resolved.
> > 
> > Well, yes and no. The address was claimed before, yes, but then a new node came
> > onto the bus and disputed that address. In that case the dispute needs to be
> > resolved first. Imagine you would NOT wait 250ms, but the other CF did
> > correctly claim its address, but it was you who did not receive that message
> > for some reason. Now also assume that your own NAME has a lower priority than
> > the other CF. In this case you can send a "claimed address" message to claim
> > your address again, but it will be contested. If you don't wait for the
> > contestant, it is you who will be in violation of the protocol, because you
> > should have changed your own address but failed to do so.
> > 
> > > - If the offending CF ignores the dispute, as you said, then the other
> > > CF has no chance to ever resume normal operation and so the network
> > > cannot be aware that the other CF is not working correctly because the
> > > offending CF is spoofing its own address.
> > 
> > Correct. And like I said in my previous reply, this is unfortunately how CAN,
> > J1939 and ISObus work. The whole network must cooperate and there is no
> > consideration for malign or adversarial actors.
> > There are also a lot of possible corner cases that these standards
> > unfortunately do not take into account. Conformance test tools seem to be even
> > more problematic and tend to have bugs quite often. I am still inclined to
> > think this is the case with your test tool.
> > 
> > > This seems to make useless the
> > > requirement that states to activate the DTC in "4.4.4.3 Address
> > > violation".
> > 
> > The requirement is not useless. You can still set and store the DTC, just not
> > broadcast it to the network at that moment.
> > 
> > Best regards,
> > 
> > 
> 
> Thank you for your feedback and explanation.
> I asked the customer to contact the compliance company so that we can
> verify with them this particular use-case. I want to understand if there
> is an application note or exception that states how to manage it or if
> they implemented the test basing it on their own interpretation and how
> it really works: supposing that the test does not check the DM1
> presence, then the test could be passed even without sending the DM1
> message during the 250ms after the adress-claimed message.
> 
> Best regards,
> Devid

Hi David, all,

I'm sorry for resuming this discussion after a long time but I noticed
that the driver forces the 250 ms wait even when responding to a request
for address-claimed which is against point d) of ISO 11783-5 "4.5.2
Address claim requirements":

No CF shall begin, or resume, transmission on the network until 250 ms
after it has successfully claimed  an  address  (see Figure 4), except
when responding to a request for address-claimed.

IMHO the driver shall be able to detect above condition or shall not
force the 250 ms wait which should then be implemented, depending on the
case, on user-space application side.

Thank you, best regards,
Devid


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RESEND] can: j1939: do not wait 250ms if the same addr was already claimed
  2022-11-17 14:08                 ` Devid Antonio Filoni
@ 2022-11-17 15:22                   ` David Jander
  2022-11-18  6:06                     ` Oleksij Rempel
  0 siblings, 1 reply; 28+ messages in thread
From: David Jander @ 2022-11-17 15:22 UTC (permalink / raw)
  To: Devid Antonio Filoni
  Cc: Oleksij Rempel, Kurt Van Dijck, Robin van der Gracht, kernel,
	linux-can, Oleksij Rempel, Oliver Hartkopp, Marc Kleine-Budde,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Maxime Jayat,
	kbuild test robot, netdev, linux-kernel

On Thu, 17 Nov 2022 15:08:20 +0100
Devid Antonio Filoni <devid.filoni@egluetechnologies.com> wrote:

> On Fri, 2022-05-13 at 11:46 +0200, Devid Antonio Filoni wrote:
> > Hi David,
> > 
> > On Wed, 2022-05-11 at 16:22 +0200, David Jander wrote:  
> > > Hi Devid,
> > > 
> > > On Wed, 11 May 2022 14:55:04 +0200
> > > Devid Antonio Filoni <
> > > devid.filoni@egluetechnologies.com  
> > > > wrote:  
> > >   
> > > > On Wed, 2022-05-11 at 11:06 +0200, David Jander wrote:  
> > > > > Hi,
> > > > > 
> > > > > On Wed, 11 May 2022 10:47:28 +0200
> > > > > Oleksij Rempel <
> > > > > o.rempel@pengutronix.de
> > > > >     
> > > > > > wrote:    
> > > > > 
> > > > >     
> > > > > > Hi,
> > > > > > 
> > > > > > i'll CC more J1939 users to the discussion.    
> > > > > 
> > > > > Thanks for the CC.
> > > > >     
> > > > > > On Tue, May 10, 2022 at 01:00:41PM +0200, Devid Antonio Filoni wrote:    
> > > > > > > Hi,
> > > > > > > 
> > > > > > > On Tue, 2022-05-10 at 06:26 +0200, Oleksij Rempel wrote:      
> > > > > > > > Hi,
> > > > > > > > 
> > > > > > > > On Mon, May 09, 2022 at 09:04:06PM +0200, Kurt Van Dijck wrote:      
> > > > > > > > > On ma, 09 mei 2022 19:03:03 +0200, Devid Antonio Filoni wrote:      
> > > > > > > > > > This is not explicitly stated in SAE J1939-21 and some tools used for
> > > > > > > > > > ISO-11783 certification do not expect this wait.      
> > > > > > > > 
> > > > > > > > It will be interesting to know which certification tool do not expect it and
> > > > > > > > what explanation is used if it fails?
> > > > > > > >       
> > > > > > > > > IMHO, the current behaviour is not explicitely stated, but nor is the opposite.
> > > > > > > > > And if I'm not mistaken, this introduces a 250msec delay.
> > > > > > > > > 
> > > > > > > > > 1. If you want to avoid the 250msec gap, you should avoid to contest the same address.
> > > > > > > > > 
> > > > > > > > > 2. It's a balance between predictability and flexibility, but if you try to accomplish both,
> > > > > > > > > as your patch suggests, there is slight time-window until the current owner responds,
> > > > > > > > > in which it may be confusing which node has the address. It depends on how much history
> > > > > > > > > you have collected on the bus.
> > > > > > > > > 
> > > > > > > > > I'm sure that this problem decreases with increasing processing power on the nodes,
> > > > > > > > > but bigger internal queues also increase this window.
> > > > > > > > > 
> > > > > > > > > It would certainly help if you describe how the current implementation fails.
> > > > > > > > > 
> > > > > > > > > Would decreasing the dead time to 50msec help in such case.
> > > > > > > > > 
> > > > > > > > > Kind regards,
> > > > > > > > > Kurt
> > > > > > > > >       
> > > > > > > > 
> > > > > > > >       
> > > > > > > 
> > > > > > > The test that is being executed during the ISOBUS compliance is the
> > > > > > > following: after an address has been claimed by a CF (#1), another CF
> > > > > > > (#2) sends a  message (other than address-claim) using the same address
> > > > > > > claimed by CF #1.
> > > > > > > 
> > > > > > > As per ISO11783-5 standard, if a CF receives a message, other than the
> > > > > > > address-claimed message, which uses the CF's own SA, then the CF (#1):
> > > > > > > - shall send the address-claim message to the Global address;
> > > > > > > - shall activate a diagnostic trouble code with SPN = 2000+SA and FMI =
> > > > > > > 31
> > > > > > > 
> > > > > > > After the address-claim message is sent by CF #1, as per ISO11783-5
> > > > > > > standard:
> > > > > > > - If the name of the CF #1 has a lower priority then the one of the CF
> > > > > > > #2, the the CF #2 shall send its address-claim message and thus the CF
> > > > > > > #1 shall send the cannot-claim-address message or shall execute again
> > > > > > > the claim procedure with a new address
> > > > > > > - If the name of the CF #1 has higher priority then the of the CF #2,
> > > > > > > then the CF #2 shall send the cannot-claim-address message or shall
> > > > > > > execute the claim procedure with a new address
> > > > > > > 
> > > > > > > Above conflict management is OK with current J1939 driver
> > > > > > > implementation, however, since the driver always waits 250ms after
> > > > > > > sending an address-claim message, the CF #1 cannot set the DTC. The DM1
> > > > > > > message which is expected to be sent each second (as per J1939-73
> > > > > > > standard) may not be sent.
> > > > > > > 
> > > > > > > Honestly, I don't know which company is doing the ISOBUS compliance
> > > > > > > tests on our products and which tool they use as it was choosen by our
> > > > > > > customer, however they did send us some CAN traces of previously
> > > > > > > performed tests and we noticed that the DM1 message is sent 160ms after
> > > > > > > the address-claim message (but it may also be lower then that), and this
> > > > > > > is something that we cannot do because the driver blocks the application
> > > > > > > from sending it.
> > > > > > > 
> > > > > > > 28401.127146 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > > > with other CF's address
> > > > > > > 28401.167414 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > > > Claim - SA = F0
> > > > > > > 28401.349214 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 01 FF FF  //DM1
> > > > > > > 28402.155774 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > > > with other CF's address
> > > > > > > 28402.169455 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > > > Claim - SA = F0
> > > > > > > 28402.348226 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 02 FF FF  //DM1
> > > > > > > 28403.182753 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > > > with other CF's address
> > > > > > > 28403.188648 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > > > Claim - SA = F0
> > > > > > > 28403.349328 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > > > 28404.349406 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > > > 28405.349740 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > > > 
> > > > > > > Since the 250ms wait is not explicitly stated, IMHO it should be up to
> > > > > > > the user-space implementation to decide how to manage it.    
> > > > > 
> > > > > I think this is not entirely correct. AFAICS the 250ms wait is indeed
> > > > > explicitly stated.
> > > > > The following is taken from ISO 11783-5:
> > > > > 
> > > > > In "4.4.4.3 Address violation" it states that "If a CF receives a message,
> > > > > other than the address-claimed message, which uses the CF’s own SA, then the
> > > > > CF [...] shall send the address-claim message to the Global address."
> > > > > 
> > > > > So the CF shall claim its address again. But further down, in "4.5.2 Address
> > > > > claim requirements" it is stated that "...No CF shall begin, or resume,
> > > > > transmission on the network until 250 ms after it has successfully claimed an
> > > > > address".
> > > > > 
> > > > > At this moment, the address is in dispute. The affected CFs are not allowed to
> > > > > send any other messages until this dispute is resolved, and the standard
> > > > > requires a waiting time of 250ms which is minimally deemed necessary to give
> > > > > all participants time to respond and eventually dispute the address claim.
> > > > > 
> > > > > If the offending CF ignores this dispute and keeps sending incorrect messages
> > > > > faster than every 250ms, then effectively the other CF has no chance to ever
> > > > > resume normal operation because its address is still disputed.
> > > > > 
> > > > > According to 4.4.4.3 it is also required to set a DTC, but it will not be
> > > > > allowed to send the DM1 message unless the address dispute is resolved.
> > > > > 
> > > > > This effectively leads to the offending CF to DoS the affected CF if it keeps
> > > > > sending offending messages. Unfortunately neither J1939 nor ISObus takes into
> > > > > account adversarial behavior on the CAN network, so we cannot do anything
> > > > > about this.
> > > > > 
> > > > > As for the ISObus compliance tool that is mentioned by Devid, IMHO this
> > > > > compliance tool should be challenged and fixed, since it is broken.
> > > > > 
> > > > > The networking layer is prohibiting the DM1 message to be sent, and the
> > > > > networking layer has precedence above all superior protocol layers, so the
> > > > > diagnostics layer is not able to operate at this moment.
> > > > > 
> > > > > Best regards,
> > > > > 
> > > > >     
> > > > 
> > > > Hi David,
> > > > 
> > > > I get your point but I'm not sure that it is the correct interpretation
> > > > that should be applied in this particular case for the following
> > > > reasons:
> > > > 
> > > > - In "4.5.2 Address claim requirements" it is explicitly stated that
> > > > "The CF shall claim its own address when initializing and when
> > > > responding to a command to change its NAME or address" and this seems to  
> > > 
> > > The standard unfortunately has a track record of ignoring a lot of scenarios
> > > and corner cases, like in this instance the fact that there can appear new
> > > participants on the bus _after_ initialization has long finished, and it would
> > > need to claim its address again in that case.
> > > 
> > > But look at point d) of that same section: "No CF shall begin, or resume,
> > > transmission on the network until 250 ms after it has successfully claimed an
> > > address (Figure 4). This does not apply when responding to a request for
> > > address claimed."
> > > 
> > > So we basically have two situations when this will apply after the network is
> > > up and running and a new node suddenly appears:
> > > 
> > >  1. The new node starts with a "Request for address claimed" message, to
> > >  which your CF should respond with an "Address Claimed" message and NOT wait
> > >  250ms.
> > > 
> > > or
> > > 
> > >  2. The new node creates an addressing conflict either by claiming its address
> > >  without first sending a "request for address claimed" message or (and this is
> > >  your case) simply using its address without claiming it first.
> > > 
> > > It is this second possibility where there is a conflict that must be resolved,
> > > and then you must wait 250ms after claiming the conflicting address for
> > > yourself.
> > >   
> > > > completely ignore the "4.4.4.3 Address violation" that states that the
> > > > address-claimed message shall be sent also when "the CF receives a
> > > > message, other than the address-claimed message, which uses the CF's own
> > > > SA".
> > > > Please note that the address was already claimed by the CF, so I think
> > > > that the initialization requirements should not apply in this case since
> > > > all disputes were already resolved.  
> > > 
> > > Well, yes and no. The address was claimed before, yes, but then a new node came
> > > onto the bus and disputed that address. In that case the dispute needs to be
> > > resolved first. Imagine you would NOT wait 250ms, but the other CF did
> > > correctly claim its address, but it was you who did not receive that message
> > > for some reason. Now also assume that your own NAME has a lower priority than
> > > the other CF. In this case you can send a "claimed address" message to claim
> > > your address again, but it will be contested. If you don't wait for the
> > > contestant, it is you who will be in violation of the protocol, because you
> > > should have changed your own address but failed to do so.
> > >   
> > > > - If the offending CF ignores the dispute, as you said, then the other
> > > > CF has no chance to ever resume normal operation and so the network
> > > > cannot be aware that the other CF is not working correctly because the
> > > > offending CF is spoofing its own address.  
> > > 
> > > Correct. And like I said in my previous reply, this is unfortunately how CAN,
> > > J1939 and ISObus work. The whole network must cooperate and there is no
> > > consideration for malign or adversarial actors.
> > > There are also a lot of possible corner cases that these standards
> > > unfortunately do not take into account. Conformance test tools seem to be even
> > > more problematic and tend to have bugs quite often. I am still inclined to
> > > think this is the case with your test tool.
> > >   
> > > > This seems to make useless the
> > > > requirement that states to activate the DTC in "4.4.4.3 Address
> > > > violation".  
> > > 
> > > The requirement is not useless. You can still set and store the DTC, just not
> > > broadcast it to the network at that moment.
> > > 
> > > Best regards,
> > > 
> > >   
> > 
> > Thank you for your feedback and explanation.
> > I asked the customer to contact the compliance company so that we can
> > verify with them this particular use-case. I want to understand if there
> > is an application note or exception that states how to manage it or if
> > they implemented the test basing it on their own interpretation and how
> > it really works: supposing that the test does not check the DM1
> > presence, then the test could be passed even without sending the DM1
> > message during the 250ms after the adress-claimed message.
> > 
> > Best regards,
> > Devid  
> 
> Hi David, all,
> 
> I'm sorry for resuming this discussion after a long time but I noticed
> that the driver forces the 250 ms wait even when responding to a request
> for address-claimed which is against point d) of ISO 11783-5 "4.5.2
> Address claim requirements":
> 
> No CF shall begin, or resume, transmission on the network until 250 ms
> after it has successfully claimed  an  address  (see Figure 4), except
> when responding to a request for address-claimed.
> 
> IMHO the driver shall be able to detect above condition or shall not
> force the 250 ms wait which should then be implemented, depending on the
> case, on user-space application side.

I am a bit out of the loop with this driver, but I think what you say is
correct. The J1939 stack should NOT unconditionally stay silent for 250ms
after sending an Address Claimed message. It should specifically NOT do so if
it is just responding to a Request for Address Claimed message.

So if it is indeed so, that the J1939 stack will hold off sending messages
forcibly after sending an Address Claimed message as a reply to a Request for
Address Claimed, then I'd say this is a bug.

@Oleksij, can you confirm this?

Best regards,

-- 
David Jander
Protonic Holland.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RESEND] can: j1939: do not wait 250ms if the same addr was already claimed
  2022-11-17 15:22                   ` David Jander
@ 2022-11-18  6:06                     ` Oleksij Rempel
  2022-11-18 10:25                       ` Devid Antonio Filoni
  0 siblings, 1 reply; 28+ messages in thread
From: Oleksij Rempel @ 2022-11-18  6:06 UTC (permalink / raw)
  To: David Jander
  Cc: Devid Antonio Filoni, Kurt Van Dijck, Robin van der Gracht,
	kernel, linux-can, Oleksij Rempel, Oliver Hartkopp,
	Marc Kleine-Budde, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Maxime Jayat, kbuild test robot, netdev, linux-kernel

On Thu, Nov 17, 2022 at 04:22:51PM +0100, David Jander wrote:
> On Thu, 17 Nov 2022 15:08:20 +0100
> Devid Antonio Filoni <devid.filoni@egluetechnologies.com> wrote:
> 
> > On Fri, 2022-05-13 at 11:46 +0200, Devid Antonio Filoni wrote:
> > > Hi David,
> > > 
> > > On Wed, 2022-05-11 at 16:22 +0200, David Jander wrote:  
> > > > Hi Devid,
> > > > 
> > > > On Wed, 11 May 2022 14:55:04 +0200
> > > > Devid Antonio Filoni <
> > > > devid.filoni@egluetechnologies.com  
> > > > > wrote:  
> > > >   
> > > > > On Wed, 2022-05-11 at 11:06 +0200, David Jander wrote:  
> > > > > > Hi,
> > > > > > 
> > > > > > On Wed, 11 May 2022 10:47:28 +0200
> > > > > > Oleksij Rempel <
> > > > > > o.rempel@pengutronix.de
> > > > > >     
> > > > > > > wrote:    
> > > > > > 
> > > > > >     
> > > > > > > Hi,
> > > > > > > 
> > > > > > > i'll CC more J1939 users to the discussion.    
> > > > > > 
> > > > > > Thanks for the CC.
> > > > > >     
> > > > > > > On Tue, May 10, 2022 at 01:00:41PM +0200, Devid Antonio Filoni wrote:    
> > > > > > > > Hi,
> > > > > > > > 
> > > > > > > > On Tue, 2022-05-10 at 06:26 +0200, Oleksij Rempel wrote:      
> > > > > > > > > Hi,
> > > > > > > > > 
> > > > > > > > > On Mon, May 09, 2022 at 09:04:06PM +0200, Kurt Van Dijck wrote:      
> > > > > > > > > > On ma, 09 mei 2022 19:03:03 +0200, Devid Antonio Filoni wrote:      
> > > > > > > > > > > This is not explicitly stated in SAE J1939-21 and some tools used for
> > > > > > > > > > > ISO-11783 certification do not expect this wait.      
> > > > > > > > > 
> > > > > > > > > It will be interesting to know which certification tool do not expect it and
> > > > > > > > > what explanation is used if it fails?
> > > > > > > > >       
> > > > > > > > > > IMHO, the current behaviour is not explicitely stated, but nor is the opposite.
> > > > > > > > > > And if I'm not mistaken, this introduces a 250msec delay.
> > > > > > > > > > 
> > > > > > > > > > 1. If you want to avoid the 250msec gap, you should avoid to contest the same address.
> > > > > > > > > > 
> > > > > > > > > > 2. It's a balance between predictability and flexibility, but if you try to accomplish both,
> > > > > > > > > > as your patch suggests, there is slight time-window until the current owner responds,
> > > > > > > > > > in which it may be confusing which node has the address. It depends on how much history
> > > > > > > > > > you have collected on the bus.
> > > > > > > > > > 
> > > > > > > > > > I'm sure that this problem decreases with increasing processing power on the nodes,
> > > > > > > > > > but bigger internal queues also increase this window.
> > > > > > > > > > 
> > > > > > > > > > It would certainly help if you describe how the current implementation fails.
> > > > > > > > > > 
> > > > > > > > > > Would decreasing the dead time to 50msec help in such case.
> > > > > > > > > > 
> > > > > > > > > > Kind regards,
> > > > > > > > > > Kurt
> > > > > > > > > >       
> > > > > > > > > 
> > > > > > > > >       
> > > > > > > > 
> > > > > > > > The test that is being executed during the ISOBUS compliance is the
> > > > > > > > following: after an address has been claimed by a CF (#1), another CF
> > > > > > > > (#2) sends a  message (other than address-claim) using the same address
> > > > > > > > claimed by CF #1.
> > > > > > > > 
> > > > > > > > As per ISO11783-5 standard, if a CF receives a message, other than the
> > > > > > > > address-claimed message, which uses the CF's own SA, then the CF (#1):
> > > > > > > > - shall send the address-claim message to the Global address;
> > > > > > > > - shall activate a diagnostic trouble code with SPN = 2000+SA and FMI =
> > > > > > > > 31
> > > > > > > > 
> > > > > > > > After the address-claim message is sent by CF #1, as per ISO11783-5
> > > > > > > > standard:
> > > > > > > > - If the name of the CF #1 has a lower priority then the one of the CF
> > > > > > > > #2, the the CF #2 shall send its address-claim message and thus the CF
> > > > > > > > #1 shall send the cannot-claim-address message or shall execute again
> > > > > > > > the claim procedure with a new address
> > > > > > > > - If the name of the CF #1 has higher priority then the of the CF #2,
> > > > > > > > then the CF #2 shall send the cannot-claim-address message or shall
> > > > > > > > execute the claim procedure with a new address
> > > > > > > > 
> > > > > > > > Above conflict management is OK with current J1939 driver
> > > > > > > > implementation, however, since the driver always waits 250ms after
> > > > > > > > sending an address-claim message, the CF #1 cannot set the DTC. The DM1
> > > > > > > > message which is expected to be sent each second (as per J1939-73
> > > > > > > > standard) may not be sent.
> > > > > > > > 
> > > > > > > > Honestly, I don't know which company is doing the ISOBUS compliance
> > > > > > > > tests on our products and which tool they use as it was choosen by our
> > > > > > > > customer, however they did send us some CAN traces of previously
> > > > > > > > performed tests and we noticed that the DM1 message is sent 160ms after
> > > > > > > > the address-claim message (but it may also be lower then that), and this
> > > > > > > > is something that we cannot do because the driver blocks the application
> > > > > > > > from sending it.
> > > > > > > > 
> > > > > > > > 28401.127146 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > > > > with other CF's address
> > > > > > > > 28401.167414 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > > > > Claim - SA = F0
> > > > > > > > 28401.349214 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 01 FF FF  //DM1
> > > > > > > > 28402.155774 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > > > > with other CF's address
> > > > > > > > 28402.169455 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > > > > Claim - SA = F0
> > > > > > > > 28402.348226 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 02 FF FF  //DM1
> > > > > > > > 28403.182753 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > > > > with other CF's address
> > > > > > > > 28403.188648 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > > > > Claim - SA = F0
> > > > > > > > 28403.349328 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > > > > 28404.349406 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > > > > 28405.349740 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > > > > 
> > > > > > > > Since the 250ms wait is not explicitly stated, IMHO it should be up to
> > > > > > > > the user-space implementation to decide how to manage it.    
> > > > > > 
> > > > > > I think this is not entirely correct. AFAICS the 250ms wait is indeed
> > > > > > explicitly stated.
> > > > > > The following is taken from ISO 11783-5:
> > > > > > 
> > > > > > In "4.4.4.3 Address violation" it states that "If a CF receives a message,
> > > > > > other than the address-claimed message, which uses the CF’s own SA, then the
> > > > > > CF [...] shall send the address-claim message to the Global address."
> > > > > > 
> > > > > > So the CF shall claim its address again. But further down, in "4.5.2 Address
> > > > > > claim requirements" it is stated that "...No CF shall begin, or resume,
> > > > > > transmission on the network until 250 ms after it has successfully claimed an
> > > > > > address".
> > > > > > 
> > > > > > At this moment, the address is in dispute. The affected CFs are not allowed to
> > > > > > send any other messages until this dispute is resolved, and the standard
> > > > > > requires a waiting time of 250ms which is minimally deemed necessary to give
> > > > > > all participants time to respond and eventually dispute the address claim.
> > > > > > 
> > > > > > If the offending CF ignores this dispute and keeps sending incorrect messages
> > > > > > faster than every 250ms, then effectively the other CF has no chance to ever
> > > > > > resume normal operation because its address is still disputed.
> > > > > > 
> > > > > > According to 4.4.4.3 it is also required to set a DTC, but it will not be
> > > > > > allowed to send the DM1 message unless the address dispute is resolved.
> > > > > > 
> > > > > > This effectively leads to the offending CF to DoS the affected CF if it keeps
> > > > > > sending offending messages. Unfortunately neither J1939 nor ISObus takes into
> > > > > > account adversarial behavior on the CAN network, so we cannot do anything
> > > > > > about this.
> > > > > > 
> > > > > > As for the ISObus compliance tool that is mentioned by Devid, IMHO this
> > > > > > compliance tool should be challenged and fixed, since it is broken.
> > > > > > 
> > > > > > The networking layer is prohibiting the DM1 message to be sent, and the
> > > > > > networking layer has precedence above all superior protocol layers, so the
> > > > > > diagnostics layer is not able to operate at this moment.
> > > > > > 
> > > > > > Best regards,
> > > > > > 
> > > > > >     
> > > > > 
> > > > > Hi David,
> > > > > 
> > > > > I get your point but I'm not sure that it is the correct interpretation
> > > > > that should be applied in this particular case for the following
> > > > > reasons:
> > > > > 
> > > > > - In "4.5.2 Address claim requirements" it is explicitly stated that
> > > > > "The CF shall claim its own address when initializing and when
> > > > > responding to a command to change its NAME or address" and this seems to  
> > > > 
> > > > The standard unfortunately has a track record of ignoring a lot of scenarios
> > > > and corner cases, like in this instance the fact that there can appear new
> > > > participants on the bus _after_ initialization has long finished, and it would
> > > > need to claim its address again in that case.
> > > > 
> > > > But look at point d) of that same section: "No CF shall begin, or resume,
> > > > transmission on the network until 250 ms after it has successfully claimed an
> > > > address (Figure 4). This does not apply when responding to a request for
> > > > address claimed."
> > > > 
> > > > So we basically have two situations when this will apply after the network is
> > > > up and running and a new node suddenly appears:
> > > > 
> > > >  1. The new node starts with a "Request for address claimed" message, to
> > > >  which your CF should respond with an "Address Claimed" message and NOT wait
> > > >  250ms.
> > > > 
> > > > or
> > > > 
> > > >  2. The new node creates an addressing conflict either by claiming its address
> > > >  without first sending a "request for address claimed" message or (and this is
> > > >  your case) simply using its address without claiming it first.
> > > > 
> > > > It is this second possibility where there is a conflict that must be resolved,
> > > > and then you must wait 250ms after claiming the conflicting address for
> > > > yourself.
> > > >   
> > > > > completely ignore the "4.4.4.3 Address violation" that states that the
> > > > > address-claimed message shall be sent also when "the CF receives a
> > > > > message, other than the address-claimed message, which uses the CF's own
> > > > > SA".
> > > > > Please note that the address was already claimed by the CF, so I think
> > > > > that the initialization requirements should not apply in this case since
> > > > > all disputes were already resolved.  
> > > > 
> > > > Well, yes and no. The address was claimed before, yes, but then a new node came
> > > > onto the bus and disputed that address. In that case the dispute needs to be
> > > > resolved first. Imagine you would NOT wait 250ms, but the other CF did
> > > > correctly claim its address, but it was you who did not receive that message
> > > > for some reason. Now also assume that your own NAME has a lower priority than
> > > > the other CF. In this case you can send a "claimed address" message to claim
> > > > your address again, but it will be contested. If you don't wait for the
> > > > contestant, it is you who will be in violation of the protocol, because you
> > > > should have changed your own address but failed to do so.
> > > >   
> > > > > - If the offending CF ignores the dispute, as you said, then the other
> > > > > CF has no chance to ever resume normal operation and so the network
> > > > > cannot be aware that the other CF is not working correctly because the
> > > > > offending CF is spoofing its own address.  
> > > > 
> > > > Correct. And like I said in my previous reply, this is unfortunately how CAN,
> > > > J1939 and ISObus work. The whole network must cooperate and there is no
> > > > consideration for malign or adversarial actors.
> > > > There are also a lot of possible corner cases that these standards
> > > > unfortunately do not take into account. Conformance test tools seem to be even
> > > > more problematic and tend to have bugs quite often. I am still inclined to
> > > > think this is the case with your test tool.
> > > >   
> > > > > This seems to make useless the
> > > > > requirement that states to activate the DTC in "4.4.4.3 Address
> > > > > violation".  
> > > > 
> > > > The requirement is not useless. You can still set and store the DTC, just not
> > > > broadcast it to the network at that moment.
> > > > 
> > > > Best regards,
> > > > 
> > > >   
> > > 
> > > Thank you for your feedback and explanation.
> > > I asked the customer to contact the compliance company so that we can
> > > verify with them this particular use-case. I want to understand if there
> > > is an application note or exception that states how to manage it or if
> > > they implemented the test basing it on their own interpretation and how
> > > it really works: supposing that the test does not check the DM1
> > > presence, then the test could be passed even without sending the DM1
> > > message during the 250ms after the adress-claimed message.
> > > 
> > > Best regards,
> > > Devid  
> > 
> > Hi David, all,
> > 
> > I'm sorry for resuming this discussion after a long time but I noticed
> > that the driver forces the 250 ms wait even when responding to a request
> > for address-claimed which is against point d) of ISO 11783-5 "4.5.2
> > Address claim requirements":
> > 
> > No CF shall begin, or resume, transmission on the network until 250 ms
> > after it has successfully claimed  an  address  (see Figure 4), except
> > when responding to a request for address-claimed.
> > 
> > IMHO the driver shall be able to detect above condition or shall not
> > force the 250 ms wait which should then be implemented, depending on the
> > case, on user-space application side.
> 
> I am a bit out of the loop with this driver, but I think what you say is
> correct. The J1939 stack should NOT unconditionally stay silent for 250ms
> after sending an Address Claimed message. It should specifically NOT do so if
> it is just responding to a Request for Address Claimed message.
> 
> So if it is indeed so, that the J1939 stack will hold off sending messages
> forcibly after sending an Address Claimed message as a reply to a Request for
> Address Claimed, then I'd say this is a bug.
> 
> @Oleksij, can you confirm this?

I do not see any code path inside of the j1939 stack preventing sending
you anything by address. The only part which cares about address
claiming is net/can/j1939/address-claim.c and it will just not be able
to resolve name to address, because address claiming was not finished
jet. With other words, if you need to send responding to a request for
address-claimed, then just send it by using address instead of name.

Regards,
Oleksij
-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RESEND] can: j1939: do not wait 250ms if the same addr was already claimed
  2022-11-18  6:06                     ` Oleksij Rempel
@ 2022-11-18 10:25                       ` Devid Antonio Filoni
  2022-11-18 12:30                         ` Oleksij Rempel
  0 siblings, 1 reply; 28+ messages in thread
From: Devid Antonio Filoni @ 2022-11-18 10:25 UTC (permalink / raw)
  To: Oleksij Rempel, David Jander
  Cc: Kurt Van Dijck, Robin van der Gracht, kernel, linux-can,
	Oleksij Rempel, Oliver Hartkopp, Marc Kleine-Budde,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Maxime Jayat,
	kbuild test robot, netdev, linux-kernel

On Fri, 2022-11-18 at 07:06 +0100, Oleksij Rempel wrote:
> On Thu, Nov 17, 2022 at 04:22:51PM +0100, David Jander wrote:
> > On Thu, 17 Nov 2022 15:08:20 +0100
> > Devid Antonio Filoni <devid.filoni@egluetechnologies.com> wrote:
> > 
> > > On Fri, 2022-05-13 at 11:46 +0200, Devid Antonio Filoni wrote:
> > > > Hi David,
> > > > 
> > > > On Wed, 2022-05-11 at 16:22 +0200, David Jander wrote:  
> > > > > Hi Devid,
> > > > > 
> > > > > On Wed, 11 May 2022 14:55:04 +0200
> > > > > Devid Antonio Filoni <
> > > > > devid.filoni@egluetechnologies.com  
> > > > > > wrote:  
> > > > >   
> > > > > > On Wed, 2022-05-11 at 11:06 +0200, David Jander wrote:  
> > > > > > > Hi,
> > > > > > > 
> > > > > > > On Wed, 11 May 2022 10:47:28 +0200
> > > > > > > Oleksij Rempel <
> > > > > > > o.rempel@pengutronix.de
> > > > > > >     
> > > > > > > > wrote:    
> > > > > > > 
> > > > > > >     
> > > > > > > > Hi,
> > > > > > > > 
> > > > > > > > i'll CC more J1939 users to the discussion.    
> > > > > > > 
> > > > > > > Thanks for the CC.
> > > > > > >     
> > > > > > > > On Tue, May 10, 2022 at 01:00:41PM +0200, Devid Antonio Filoni wrote:    
> > > > > > > > > Hi,
> > > > > > > > > 
> > > > > > > > > On Tue, 2022-05-10 at 06:26 +0200, Oleksij Rempel wrote:      
> > > > > > > > > > Hi,
> > > > > > > > > > 
> > > > > > > > > > On Mon, May 09, 2022 at 09:04:06PM +0200, Kurt Van Dijck wrote:      
> > > > > > > > > > > On ma, 09 mei 2022 19:03:03 +0200, Devid Antonio Filoni wrote:      
> > > > > > > > > > > > This is not explicitly stated in SAE J1939-21 and some tools used for
> > > > > > > > > > > > ISO-11783 certification do not expect this wait.      
> > > > > > > > > > 
> > > > > > > > > > It will be interesting to know which certification tool do not expect it and
> > > > > > > > > > what explanation is used if it fails?
> > > > > > > > > >       
> > > > > > > > > > > IMHO, the current behaviour is not explicitely stated, but nor is the opposite.
> > > > > > > > > > > And if I'm not mistaken, this introduces a 250msec delay.
> > > > > > > > > > > 
> > > > > > > > > > > 1. If you want to avoid the 250msec gap, you should avoid to contest the same address.
> > > > > > > > > > > 
> > > > > > > > > > > 2. It's a balance between predictability and flexibility, but if you try to accomplish both,
> > > > > > > > > > > as your patch suggests, there is slight time-window until the current owner responds,
> > > > > > > > > > > in which it may be confusing which node has the address. It depends on how much history
> > > > > > > > > > > you have collected on the bus.
> > > > > > > > > > > 
> > > > > > > > > > > I'm sure that this problem decreases with increasing processing power on the nodes,
> > > > > > > > > > > but bigger internal queues also increase this window.
> > > > > > > > > > > 
> > > > > > > > > > > It would certainly help if you describe how the current implementation fails.
> > > > > > > > > > > 
> > > > > > > > > > > Would decreasing the dead time to 50msec help in such case.
> > > > > > > > > > > 
> > > > > > > > > > > Kind regards,
> > > > > > > > > > > Kurt
> > > > > > > > > > >       
> > > > > > > > > > 
> > > > > > > > > >       
> > > > > > > > > 
> > > > > > > > > The test that is being executed during the ISOBUS compliance is the
> > > > > > > > > following: after an address has been claimed by a CF (#1), another CF
> > > > > > > > > (#2) sends a  message (other than address-claim) using the same address
> > > > > > > > > claimed by CF #1.
> > > > > > > > > 
> > > > > > > > > As per ISO11783-5 standard, if a CF receives a message, other than the
> > > > > > > > > address-claimed message, which uses the CF's own SA, then the CF (#1):
> > > > > > > > > - shall send the address-claim message to the Global address;
> > > > > > > > > - shall activate a diagnostic trouble code with SPN = 2000+SA and FMI =
> > > > > > > > > 31
> > > > > > > > > 
> > > > > > > > > After the address-claim message is sent by CF #1, as per ISO11783-5
> > > > > > > > > standard:
> > > > > > > > > - If the name of the CF #1 has a lower priority then the one of the CF
> > > > > > > > > #2, the the CF #2 shall send its address-claim message and thus the CF
> > > > > > > > > #1 shall send the cannot-claim-address message or shall execute again
> > > > > > > > > the claim procedure with a new address
> > > > > > > > > - If the name of the CF #1 has higher priority then the of the CF #2,
> > > > > > > > > then the CF #2 shall send the cannot-claim-address message or shall
> > > > > > > > > execute the claim procedure with a new address
> > > > > > > > > 
> > > > > > > > > Above conflict management is OK with current J1939 driver
> > > > > > > > > implementation, however, since the driver always waits 250ms after
> > > > > > > > > sending an address-claim message, the CF #1 cannot set the DTC. The DM1
> > > > > > > > > message which is expected to be sent each second (as per J1939-73
> > > > > > > > > standard) may not be sent.
> > > > > > > > > 
> > > > > > > > > Honestly, I don't know which company is doing the ISOBUS compliance
> > > > > > > > > tests on our products and which tool they use as it was choosen by our
> > > > > > > > > customer, however they did send us some CAN traces of previously
> > > > > > > > > performed tests and we noticed that the DM1 message is sent 160ms after
> > > > > > > > > the address-claim message (but it may also be lower then that), and this
> > > > > > > > > is something that we cannot do because the driver blocks the application
> > > > > > > > > from sending it.
> > > > > > > > > 
> > > > > > > > > 28401.127146 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > > > > > with other CF's address
> > > > > > > > > 28401.167414 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > > > > > Claim - SA = F0
> > > > > > > > > 28401.349214 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 01 FF FF  //DM1
> > > > > > > > > 28402.155774 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > > > > > with other CF's address
> > > > > > > > > 28402.169455 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > > > > > Claim - SA = F0
> > > > > > > > > 28402.348226 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 02 FF FF  //DM1
> > > > > > > > > 28403.182753 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > > > > > with other CF's address
> > > > > > > > > 28403.188648 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > > > > > Claim - SA = F0
> > > > > > > > > 28403.349328 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > > > > > 28404.349406 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > > > > > 28405.349740 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > > > > > 
> > > > > > > > > Since the 250ms wait is not explicitly stated, IMHO it should be up to
> > > > > > > > > the user-space implementation to decide how to manage it.    
> > > > > > > 
> > > > > > > I think this is not entirely correct. AFAICS the 250ms wait is indeed
> > > > > > > explicitly stated.
> > > > > > > The following is taken from ISO 11783-5:
> > > > > > > 
> > > > > > > In "4.4.4.3 Address violation" it states that "If a CF receives a message,
> > > > > > > other than the address-claimed message, which uses the CF’s own SA, then the
> > > > > > > CF [...] shall send the address-claim message to the Global address."
> > > > > > > 
> > > > > > > So the CF shall claim its address again. But further down, in "4.5.2 Address
> > > > > > > claim requirements" it is stated that "...No CF shall begin, or resume,
> > > > > > > transmission on the network until 250 ms after it has successfully claimed an
> > > > > > > address".
> > > > > > > 
> > > > > > > At this moment, the address is in dispute. The affected CFs are not allowed to
> > > > > > > send any other messages until this dispute is resolved, and the standard
> > > > > > > requires a waiting time of 250ms which is minimally deemed necessary to give
> > > > > > > all participants time to respond and eventually dispute the address claim.
> > > > > > > 
> > > > > > > If the offending CF ignores this dispute and keeps sending incorrect messages
> > > > > > > faster than every 250ms, then effectively the other CF has no chance to ever
> > > > > > > resume normal operation because its address is still disputed.
> > > > > > > 
> > > > > > > According to 4.4.4.3 it is also required to set a DTC, but it will not be
> > > > > > > allowed to send the DM1 message unless the address dispute is resolved.
> > > > > > > 
> > > > > > > This effectively leads to the offending CF to DoS the affected CF if it keeps
> > > > > > > sending offending messages. Unfortunately neither J1939 nor ISObus takes into
> > > > > > > account adversarial behavior on the CAN network, so we cannot do anything
> > > > > > > about this.
> > > > > > > 
> > > > > > > As for the ISObus compliance tool that is mentioned by Devid, IMHO this
> > > > > > > compliance tool should be challenged and fixed, since it is broken.
> > > > > > > 
> > > > > > > The networking layer is prohibiting the DM1 message to be sent, and the
> > > > > > > networking layer has precedence above all superior protocol layers, so the
> > > > > > > diagnostics layer is not able to operate at this moment.
> > > > > > > 
> > > > > > > Best regards,
> > > > > > > 
> > > > > > >     
> > > > > > 
> > > > > > Hi David,
> > > > > > 
> > > > > > I get your point but I'm not sure that it is the correct interpretation
> > > > > > that should be applied in this particular case for the following
> > > > > > reasons:
> > > > > > 
> > > > > > - In "4.5.2 Address claim requirements" it is explicitly stated that
> > > > > > "The CF shall claim its own address when initializing and when
> > > > > > responding to a command to change its NAME or address" and this seems to  
> > > > > 
> > > > > The standard unfortunately has a track record of ignoring a lot of scenarios
> > > > > and corner cases, like in this instance the fact that there can appear new
> > > > > participants on the bus _after_ initialization has long finished, and it would
> > > > > need to claim its address again in that case.
> > > > > 
> > > > > But look at point d) of that same section: "No CF shall begin, or resume,
> > > > > transmission on the network until 250 ms after it has successfully claimed an
> > > > > address (Figure 4). This does not apply when responding to a request for
> > > > > address claimed."
> > > > > 
> > > > > So we basically have two situations when this will apply after the network is
> > > > > up and running and a new node suddenly appears:
> > > > > 
> > > > >  1. The new node starts with a "Request for address claimed" message, to
> > > > >  which your CF should respond with an "Address Claimed" message and NOT wait
> > > > >  250ms.
> > > > > 
> > > > > or
> > > > > 
> > > > >  2. The new node creates an addressing conflict either by claiming its address
> > > > >  without first sending a "request for address claimed" message or (and this is
> > > > >  your case) simply using its address without claiming it first.
> > > > > 
> > > > > It is this second possibility where there is a conflict that must be resolved,
> > > > > and then you must wait 250ms after claiming the conflicting address for
> > > > > yourself.
> > > > >   
> > > > > > completely ignore the "4.4.4.3 Address violation" that states that the
> > > > > > address-claimed message shall be sent also when "the CF receives a
> > > > > > message, other than the address-claimed message, which uses the CF's own
> > > > > > SA".
> > > > > > Please note that the address was already claimed by the CF, so I think
> > > > > > that the initialization requirements should not apply in this case since
> > > > > > all disputes were already resolved.  
> > > > > 
> > > > > Well, yes and no. The address was claimed before, yes, but then a new node came
> > > > > onto the bus and disputed that address. In that case the dispute needs to be
> > > > > resolved first. Imagine you would NOT wait 250ms, but the other CF did
> > > > > correctly claim its address, but it was you who did not receive that message
> > > > > for some reason. Now also assume that your own NAME has a lower priority than
> > > > > the other CF. In this case you can send a "claimed address" message to claim
> > > > > your address again, but it will be contested. If you don't wait for the
> > > > > contestant, it is you who will be in violation of the protocol, because you
> > > > > should have changed your own address but failed to do so.
> > > > >   
> > > > > > - If the offending CF ignores the dispute, as you said, then the other
> > > > > > CF has no chance to ever resume normal operation and so the network
> > > > > > cannot be aware that the other CF is not working correctly because the
> > > > > > offending CF is spoofing its own address.  
> > > > > 
> > > > > Correct. And like I said in my previous reply, this is unfortunately how CAN,
> > > > > J1939 and ISObus work. The whole network must cooperate and there is no
> > > > > consideration for malign or adversarial actors.
> > > > > There are also a lot of possible corner cases that these standards
> > > > > unfortunately do not take into account. Conformance test tools seem to be even
> > > > > more problematic and tend to have bugs quite often. I am still inclined to
> > > > > think this is the case with your test tool.
> > > > >   
> > > > > > This seems to make useless the
> > > > > > requirement that states to activate the DTC in "4.4.4.3 Address
> > > > > > violation".  
> > > > > 
> > > > > The requirement is not useless. You can still set and store the DTC, just not
> > > > > broadcast it to the network at that moment.
> > > > > 
> > > > > Best regards,
> > > > > 
> > > > >   
> > > > 
> > > > Thank you for your feedback and explanation.
> > > > I asked the customer to contact the compliance company so that we can
> > > > verify with them this particular use-case. I want to understand if there
> > > > is an application note or exception that states how to manage it or if
> > > > they implemented the test basing it on their own interpretation and how
> > > > it really works: supposing that the test does not check the DM1
> > > > presence, then the test could be passed even without sending the DM1
> > > > message during the 250ms after the adress-claimed message.
> > > > 
> > > > Best regards,
> > > > Devid  
> > > 
> > > Hi David, all,
> > > 
> > > I'm sorry for resuming this discussion after a long time but I noticed
> > > that the driver forces the 250 ms wait even when responding to a request
> > > for address-claimed which is against point d) of ISO 11783-5 "4.5.2
> > > Address claim requirements":
> > > 
> > > No CF shall begin, or resume, transmission on the network until 250 ms
> > > after it has successfully claimed  an  address  (see Figure 4), except
> > > when responding to a request for address-claimed.
> > > 
> > > IMHO the driver shall be able to detect above condition or shall not
> > > force the 250 ms wait which should then be implemented, depending on the
> > > case, on user-space application side.
> > 
> > I am a bit out of the loop with this driver, but I think what you say is
> > correct. The J1939 stack should NOT unconditionally stay silent for 250ms
> > after sending an Address Claimed message. It should specifically NOT do so if
> > it is just responding to a Request for Address Claimed message.
> > 
> > So if it is indeed so, that the J1939 stack will hold off sending messages
> > forcibly after sending an Address Claimed message as a reply to a Request for
> > Address Claimed, then I'd say this is a bug.
> > 
> > @Oleksij, can you confirm this?
> 
> I do not see any code path inside of the j1939 stack preventing sending
> you anything by address. The only part which cares about address
> claiming is net/can/j1939/address-claim.c and it will just not be able
> to resolve name to address, because address claiming was not finished
> jet. With other words, if you need to send responding to a request for
> address-claimed, then just send it by using address instead of name.
> 
> Regards,
> Oleksij

Hi Oleksij,
I'm sorry but I think I don't understand your proposal.

If I send an address-claimed message binding the socket without the name
(can_addr.j1939.name = J1939_NO_NAME), then the driver returns error
EPROTO.
If I send the address-claimed message binding the socket with the name,
then the address-claimed message is sent successfully but other messages
sent within 250 ms are not sent (error EADDRNOTAVAIL).

Thank you,
Devid


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RESEND] can: j1939: do not wait 250ms if the same addr was already claimed
  2022-11-18 10:25                       ` Devid Antonio Filoni
@ 2022-11-18 12:30                         ` Oleksij Rempel
  2022-11-18 12:41                           ` Devid Antonio Filoni
  0 siblings, 1 reply; 28+ messages in thread
From: Oleksij Rempel @ 2022-11-18 12:30 UTC (permalink / raw)
  To: Devid Antonio Filoni
  Cc: David Jander, Kurt Van Dijck, Robin van der Gracht, kernel,
	linux-can, Oleksij Rempel, Oliver Hartkopp, Marc Kleine-Budde,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Maxime Jayat,
	kbuild test robot, netdev, linux-kernel

On Fri, Nov 18, 2022 at 11:25:04AM +0100, Devid Antonio Filoni wrote:
> On Fri, 2022-11-18 at 07:06 +0100, Oleksij Rempel wrote:
> > On Thu, Nov 17, 2022 at 04:22:51PM +0100, David Jander wrote:
> > > On Thu, 17 Nov 2022 15:08:20 +0100
> > > Devid Antonio Filoni <devid.filoni@egluetechnologies.com> wrote:
> > > 
> > > > On Fri, 2022-05-13 at 11:46 +0200, Devid Antonio Filoni wrote:
> > > > > Hi David,
> > > > > 
> > > > > On Wed, 2022-05-11 at 16:22 +0200, David Jander wrote:  
> > > > > > Hi Devid,
> > > > > > 
> > > > > > On Wed, 11 May 2022 14:55:04 +0200
> > > > > > Devid Antonio Filoni <
> > > > > > devid.filoni@egluetechnologies.com  
> > > > > > > wrote:  
> > > > > >   
> > > > > > > On Wed, 2022-05-11 at 11:06 +0200, David Jander wrote:  
> > > > > > > > Hi,
> > > > > > > > 
> > > > > > > > On Wed, 11 May 2022 10:47:28 +0200
> > > > > > > > Oleksij Rempel <
> > > > > > > > o.rempel@pengutronix.de
> > > > > > > >     
> > > > > > > > > wrote:    
> > > > > > > > 
> > > > > > > >     
> > > > > > > > > Hi,
> > > > > > > > > 
> > > > > > > > > i'll CC more J1939 users to the discussion.    
> > > > > > > > 
> > > > > > > > Thanks for the CC.
> > > > > > > >     
> > > > > > > > > On Tue, May 10, 2022 at 01:00:41PM +0200, Devid Antonio Filoni wrote:    
> > > > > > > > > > Hi,
> > > > > > > > > > 
> > > > > > > > > > On Tue, 2022-05-10 at 06:26 +0200, Oleksij Rempel wrote:      
> > > > > > > > > > > Hi,
> > > > > > > > > > > 
> > > > > > > > > > > On Mon, May 09, 2022 at 09:04:06PM +0200, Kurt Van Dijck wrote:      
> > > > > > > > > > > > On ma, 09 mei 2022 19:03:03 +0200, Devid Antonio Filoni wrote:      
> > > > > > > > > > > > > This is not explicitly stated in SAE J1939-21 and some tools used for
> > > > > > > > > > > > > ISO-11783 certification do not expect this wait.      
> > > > > > > > > > > 
> > > > > > > > > > > It will be interesting to know which certification tool do not expect it and
> > > > > > > > > > > what explanation is used if it fails?
> > > > > > > > > > >       
> > > > > > > > > > > > IMHO, the current behaviour is not explicitely stated, but nor is the opposite.
> > > > > > > > > > > > And if I'm not mistaken, this introduces a 250msec delay.
> > > > > > > > > > > > 
> > > > > > > > > > > > 1. If you want to avoid the 250msec gap, you should avoid to contest the same address.
> > > > > > > > > > > > 
> > > > > > > > > > > > 2. It's a balance between predictability and flexibility, but if you try to accomplish both,
> > > > > > > > > > > > as your patch suggests, there is slight time-window until the current owner responds,
> > > > > > > > > > > > in which it may be confusing which node has the address. It depends on how much history
> > > > > > > > > > > > you have collected on the bus.
> > > > > > > > > > > > 
> > > > > > > > > > > > I'm sure that this problem decreases with increasing processing power on the nodes,
> > > > > > > > > > > > but bigger internal queues also increase this window.
> > > > > > > > > > > > 
> > > > > > > > > > > > It would certainly help if you describe how the current implementation fails.
> > > > > > > > > > > > 
> > > > > > > > > > > > Would decreasing the dead time to 50msec help in such case.
> > > > > > > > > > > > 
> > > > > > > > > > > > Kind regards,
> > > > > > > > > > > > Kurt
> > > > > > > > > > > >       
> > > > > > > > > > > 
> > > > > > > > > > >       
> > > > > > > > > > 
> > > > > > > > > > The test that is being executed during the ISOBUS compliance is the
> > > > > > > > > > following: after an address has been claimed by a CF (#1), another CF
> > > > > > > > > > (#2) sends a  message (other than address-claim) using the same address
> > > > > > > > > > claimed by CF #1.
> > > > > > > > > > 
> > > > > > > > > > As per ISO11783-5 standard, if a CF receives a message, other than the
> > > > > > > > > > address-claimed message, which uses the CF's own SA, then the CF (#1):
> > > > > > > > > > - shall send the address-claim message to the Global address;
> > > > > > > > > > - shall activate a diagnostic trouble code with SPN = 2000+SA and FMI =
> > > > > > > > > > 31
> > > > > > > > > > 
> > > > > > > > > > After the address-claim message is sent by CF #1, as per ISO11783-5
> > > > > > > > > > standard:
> > > > > > > > > > - If the name of the CF #1 has a lower priority then the one of the CF
> > > > > > > > > > #2, the the CF #2 shall send its address-claim message and thus the CF
> > > > > > > > > > #1 shall send the cannot-claim-address message or shall execute again
> > > > > > > > > > the claim procedure with a new address
> > > > > > > > > > - If the name of the CF #1 has higher priority then the of the CF #2,
> > > > > > > > > > then the CF #2 shall send the cannot-claim-address message or shall
> > > > > > > > > > execute the claim procedure with a new address
> > > > > > > > > > 
> > > > > > > > > > Above conflict management is OK with current J1939 driver
> > > > > > > > > > implementation, however, since the driver always waits 250ms after
> > > > > > > > > > sending an address-claim message, the CF #1 cannot set the DTC. The DM1
> > > > > > > > > > message which is expected to be sent each second (as per J1939-73
> > > > > > > > > > standard) may not be sent.
> > > > > > > > > > 
> > > > > > > > > > Honestly, I don't know which company is doing the ISOBUS compliance
> > > > > > > > > > tests on our products and which tool they use as it was choosen by our
> > > > > > > > > > customer, however they did send us some CAN traces of previously
> > > > > > > > > > performed tests and we noticed that the DM1 message is sent 160ms after
> > > > > > > > > > the address-claim message (but it may also be lower then that), and this
> > > > > > > > > > is something that we cannot do because the driver blocks the application
> > > > > > > > > > from sending it.
> > > > > > > > > > 
> > > > > > > > > > 28401.127146 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > > > > > > with other CF's address
> > > > > > > > > > 28401.167414 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > > > > > > Claim - SA = F0
> > > > > > > > > > 28401.349214 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 01 FF FF  //DM1
> > > > > > > > > > 28402.155774 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > > > > > > with other CF's address
> > > > > > > > > > 28402.169455 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > > > > > > Claim - SA = F0
> > > > > > > > > > 28402.348226 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 02 FF FF  //DM1
> > > > > > > > > > 28403.182753 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > > > > > > with other CF's address
> > > > > > > > > > 28403.188648 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > > > > > > Claim - SA = F0
> > > > > > > > > > 28403.349328 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > > > > > > 28404.349406 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > > > > > > 28405.349740 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > > > > > > 
> > > > > > > > > > Since the 250ms wait is not explicitly stated, IMHO it should be up to
> > > > > > > > > > the user-space implementation to decide how to manage it.    
> > > > > > > > 
> > > > > > > > I think this is not entirely correct. AFAICS the 250ms wait is indeed
> > > > > > > > explicitly stated.
> > > > > > > > The following is taken from ISO 11783-5:
> > > > > > > > 
> > > > > > > > In "4.4.4.3 Address violation" it states that "If a CF receives a message,
> > > > > > > > other than the address-claimed message, which uses the CF’s own SA, then the
> > > > > > > > CF [...] shall send the address-claim message to the Global address."
> > > > > > > > 
> > > > > > > > So the CF shall claim its address again. But further down, in "4.5.2 Address
> > > > > > > > claim requirements" it is stated that "...No CF shall begin, or resume,
> > > > > > > > transmission on the network until 250 ms after it has successfully claimed an
> > > > > > > > address".
> > > > > > > > 
> > > > > > > > At this moment, the address is in dispute. The affected CFs are not allowed to
> > > > > > > > send any other messages until this dispute is resolved, and the standard
> > > > > > > > requires a waiting time of 250ms which is minimally deemed necessary to give
> > > > > > > > all participants time to respond and eventually dispute the address claim.
> > > > > > > > 
> > > > > > > > If the offending CF ignores this dispute and keeps sending incorrect messages
> > > > > > > > faster than every 250ms, then effectively the other CF has no chance to ever
> > > > > > > > resume normal operation because its address is still disputed.
> > > > > > > > 
> > > > > > > > According to 4.4.4.3 it is also required to set a DTC, but it will not be
> > > > > > > > allowed to send the DM1 message unless the address dispute is resolved.
> > > > > > > > 
> > > > > > > > This effectively leads to the offending CF to DoS the affected CF if it keeps
> > > > > > > > sending offending messages. Unfortunately neither J1939 nor ISObus takes into
> > > > > > > > account adversarial behavior on the CAN network, so we cannot do anything
> > > > > > > > about this.
> > > > > > > > 
> > > > > > > > As for the ISObus compliance tool that is mentioned by Devid, IMHO this
> > > > > > > > compliance tool should be challenged and fixed, since it is broken.
> > > > > > > > 
> > > > > > > > The networking layer is prohibiting the DM1 message to be sent, and the
> > > > > > > > networking layer has precedence above all superior protocol layers, so the
> > > > > > > > diagnostics layer is not able to operate at this moment.
> > > > > > > > 
> > > > > > > > Best regards,
> > > > > > > > 
> > > > > > > >     
> > > > > > > 
> > > > > > > Hi David,
> > > > > > > 
> > > > > > > I get your point but I'm not sure that it is the correct interpretation
> > > > > > > that should be applied in this particular case for the following
> > > > > > > reasons:
> > > > > > > 
> > > > > > > - In "4.5.2 Address claim requirements" it is explicitly stated that
> > > > > > > "The CF shall claim its own address when initializing and when
> > > > > > > responding to a command to change its NAME or address" and this seems to  
> > > > > > 
> > > > > > The standard unfortunately has a track record of ignoring a lot of scenarios
> > > > > > and corner cases, like in this instance the fact that there can appear new
> > > > > > participants on the bus _after_ initialization has long finished, and it would
> > > > > > need to claim its address again in that case.
> > > > > > 
> > > > > > But look at point d) of that same section: "No CF shall begin, or resume,
> > > > > > transmission on the network until 250 ms after it has successfully claimed an
> > > > > > address (Figure 4). This does not apply when responding to a request for
> > > > > > address claimed."
> > > > > > 
> > > > > > So we basically have two situations when this will apply after the network is
> > > > > > up and running and a new node suddenly appears:
> > > > > > 
> > > > > >  1. The new node starts with a "Request for address claimed" message, to
> > > > > >  which your CF should respond with an "Address Claimed" message and NOT wait
> > > > > >  250ms.
> > > > > > 
> > > > > > or
> > > > > > 
> > > > > >  2. The new node creates an addressing conflict either by claiming its address
> > > > > >  without first sending a "request for address claimed" message or (and this is
> > > > > >  your case) simply using its address without claiming it first.
> > > > > > 
> > > > > > It is this second possibility where there is a conflict that must be resolved,
> > > > > > and then you must wait 250ms after claiming the conflicting address for
> > > > > > yourself.
> > > > > >   
> > > > > > > completely ignore the "4.4.4.3 Address violation" that states that the
> > > > > > > address-claimed message shall be sent also when "the CF receives a
> > > > > > > message, other than the address-claimed message, which uses the CF's own
> > > > > > > SA".
> > > > > > > Please note that the address was already claimed by the CF, so I think
> > > > > > > that the initialization requirements should not apply in this case since
> > > > > > > all disputes were already resolved.  
> > > > > > 
> > > > > > Well, yes and no. The address was claimed before, yes, but then a new node came
> > > > > > onto the bus and disputed that address. In that case the dispute needs to be
> > > > > > resolved first. Imagine you would NOT wait 250ms, but the other CF did
> > > > > > correctly claim its address, but it was you who did not receive that message
> > > > > > for some reason. Now also assume that your own NAME has a lower priority than
> > > > > > the other CF. In this case you can send a "claimed address" message to claim
> > > > > > your address again, but it will be contested. If you don't wait for the
> > > > > > contestant, it is you who will be in violation of the protocol, because you
> > > > > > should have changed your own address but failed to do so.
> > > > > >   
> > > > > > > - If the offending CF ignores the dispute, as you said, then the other
> > > > > > > CF has no chance to ever resume normal operation and so the network
> > > > > > > cannot be aware that the other CF is not working correctly because the
> > > > > > > offending CF is spoofing its own address.  
> > > > > > 
> > > > > > Correct. And like I said in my previous reply, this is unfortunately how CAN,
> > > > > > J1939 and ISObus work. The whole network must cooperate and there is no
> > > > > > consideration for malign or adversarial actors.
> > > > > > There are also a lot of possible corner cases that these standards
> > > > > > unfortunately do not take into account. Conformance test tools seem to be even
> > > > > > more problematic and tend to have bugs quite often. I am still inclined to
> > > > > > think this is the case with your test tool.
> > > > > >   
> > > > > > > This seems to make useless the
> > > > > > > requirement that states to activate the DTC in "4.4.4.3 Address
> > > > > > > violation".  
> > > > > > 
> > > > > > The requirement is not useless. You can still set and store the DTC, just not
> > > > > > broadcast it to the network at that moment.
> > > > > > 
> > > > > > Best regards,
> > > > > > 
> > > > > >   
> > > > > 
> > > > > Thank you for your feedback and explanation.
> > > > > I asked the customer to contact the compliance company so that we can
> > > > > verify with them this particular use-case. I want to understand if there
> > > > > is an application note or exception that states how to manage it or if
> > > > > they implemented the test basing it on their own interpretation and how
> > > > > it really works: supposing that the test does not check the DM1
> > > > > presence, then the test could be passed even without sending the DM1
> > > > > message during the 250ms after the adress-claimed message.
> > > > > 
> > > > > Best regards,
> > > > > Devid  
> > > > 
> > > > Hi David, all,
> > > > 
> > > > I'm sorry for resuming this discussion after a long time but I noticed
> > > > that the driver forces the 250 ms wait even when responding to a request
> > > > for address-claimed which is against point d) of ISO 11783-5 "4.5.2
> > > > Address claim requirements":
> > > > 
> > > > No CF shall begin, or resume, transmission on the network until 250 ms
> > > > after it has successfully claimed  an  address  (see Figure 4), except
> > > > when responding to a request for address-claimed.
> > > > 
> > > > IMHO the driver shall be able to detect above condition or shall not
> > > > force the 250 ms wait which should then be implemented, depending on the
> > > > case, on user-space application side.
> > > 
> > > I am a bit out of the loop with this driver, but I think what you say is
> > > correct. The J1939 stack should NOT unconditionally stay silent for 250ms
> > > after sending an Address Claimed message. It should specifically NOT do so if
> > > it is just responding to a Request for Address Claimed message.
> > > 
> > > So if it is indeed so, that the J1939 stack will hold off sending messages
> > > forcibly after sending an Address Claimed message as a reply to a Request for
> > > Address Claimed, then I'd say this is a bug.
> > > 
> > > @Oleksij, can you confirm this?
> > 
> > I do not see any code path inside of the j1939 stack preventing sending
> > you anything by address. The only part which cares about address
> > claiming is net/can/j1939/address-claim.c and it will just not be able
> > to resolve name to address, because address claiming was not finished
> > jet. With other words, if you need to send responding to a request for
> > address-claimed, then just send it by using address instead of name.
> > 
> > Regards,
> > Oleksij
> 
> Hi Oleksij,
> I'm sorry but I think I don't understand your proposal.
> 
> If I send an address-claimed message binding the socket without the name
> (can_addr.j1939.name = J1939_NO_NAME), then the driver returns error
> EPROTO.
> If I send the address-claimed message binding the socket with the name,
> then the address-claimed message is sent successfully but other messages
> sent within 250 ms are not sent (error EADDRNOTAVAIL).

What kind of other messages are your trying to send?

Regards,
Oleksij
-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RESEND] can: j1939: do not wait 250ms if the same addr was already claimed
  2022-11-18 12:30                         ` Oleksij Rempel
@ 2022-11-18 12:41                           ` Devid Antonio Filoni
  2022-11-18 13:44                             ` Oleksij Rempel
  0 siblings, 1 reply; 28+ messages in thread
From: Devid Antonio Filoni @ 2022-11-18 12:41 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: David Jander, Kurt Van Dijck, Robin van der Gracht, kernel,
	linux-can, Oleksij Rempel, Oliver Hartkopp, Marc Kleine-Budde,
	David S. Miller, Jakub Kicinski, Paolo Abeni, Maxime Jayat,
	kbuild test robot, netdev, linux-kernel

On Fri, 2022-11-18 at 13:30 +0100, Oleksij Rempel wrote:
> On Fri, Nov 18, 2022 at 11:25:04AM +0100, Devid Antonio Filoni wrote:
> > On Fri, 2022-11-18 at 07:06 +0100, Oleksij Rempel wrote:
> > > On Thu, Nov 17, 2022 at 04:22:51PM +0100, David Jander wrote:
> > > > On Thu, 17 Nov 2022 15:08:20 +0100
> > > > Devid Antonio Filoni <devid.filoni@egluetechnologies.com> wrote:
> > > > 
> > > > > On Fri, 2022-05-13 at 11:46 +0200, Devid Antonio Filoni wrote:
> > > > > > Hi David,
> > > > > > 
> > > > > > On Wed, 2022-05-11 at 16:22 +0200, David Jander wrote:  
> > > > > > > Hi Devid,
> > > > > > > 
> > > > > > > On Wed, 11 May 2022 14:55:04 +0200
> > > > > > > Devid Antonio Filoni <
> > > > > > > devid.filoni@egluetechnologies.com  
> > > > > > > > wrote:  
> > > > > > >   
> > > > > > > > On Wed, 2022-05-11 at 11:06 +0200, David Jander wrote:  
> > > > > > > > > Hi,
> > > > > > > > > 
> > > > > > > > > On Wed, 11 May 2022 10:47:28 +0200
> > > > > > > > > Oleksij Rempel <
> > > > > > > > > o.rempel@pengutronix.de
> > > > > > > > >     
> > > > > > > > > > wrote:    
> > > > > > > > > 
> > > > > > > > >     
> > > > > > > > > > Hi,
> > > > > > > > > > 
> > > > > > > > > > i'll CC more J1939 users to the discussion.    
> > > > > > > > > 
> > > > > > > > > Thanks for the CC.
> > > > > > > > >     
> > > > > > > > > > On Tue, May 10, 2022 at 01:00:41PM +0200, Devid Antonio Filoni wrote:    
> > > > > > > > > > > Hi,
> > > > > > > > > > > 
> > > > > > > > > > > On Tue, 2022-05-10 at 06:26 +0200, Oleksij Rempel wrote:      
> > > > > > > > > > > > Hi,
> > > > > > > > > > > > 
> > > > > > > > > > > > On Mon, May 09, 2022 at 09:04:06PM +0200, Kurt Van Dijck wrote:      
> > > > > > > > > > > > > On ma, 09 mei 2022 19:03:03 +0200, Devid Antonio Filoni wrote:      
> > > > > > > > > > > > > > This is not explicitly stated in SAE J1939-21 and some tools used for
> > > > > > > > > > > > > > ISO-11783 certification do not expect this wait.      
> > > > > > > > > > > > 
> > > > > > > > > > > > It will be interesting to know which certification tool do not expect it and
> > > > > > > > > > > > what explanation is used if it fails?
> > > > > > > > > > > >       
> > > > > > > > > > > > > IMHO, the current behaviour is not explicitely stated, but nor is the opposite.
> > > > > > > > > > > > > And if I'm not mistaken, this introduces a 250msec delay.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 1. If you want to avoid the 250msec gap, you should avoid to contest the same address.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 2. It's a balance between predictability and flexibility, but if you try to accomplish both,
> > > > > > > > > > > > > as your patch suggests, there is slight time-window until the current owner responds,
> > > > > > > > > > > > > in which it may be confusing which node has the address. It depends on how much history
> > > > > > > > > > > > > you have collected on the bus.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I'm sure that this problem decreases with increasing processing power on the nodes,
> > > > > > > > > > > > > but bigger internal queues also increase this window.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > It would certainly help if you describe how the current implementation fails.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Would decreasing the dead time to 50msec help in such case.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Kind regards,
> > > > > > > > > > > > > Kurt
> > > > > > > > > > > > >       
> > > > > > > > > > > > 
> > > > > > > > > > > >       
> > > > > > > > > > > 
> > > > > > > > > > > The test that is being executed during the ISOBUS compliance is the
> > > > > > > > > > > following: after an address has been claimed by a CF (#1), another CF
> > > > > > > > > > > (#2) sends a  message (other than address-claim) using the same address
> > > > > > > > > > > claimed by CF #1.
> > > > > > > > > > > 
> > > > > > > > > > > As per ISO11783-5 standard, if a CF receives a message, other than the
> > > > > > > > > > > address-claimed message, which uses the CF's own SA, then the CF (#1):
> > > > > > > > > > > - shall send the address-claim message to the Global address;
> > > > > > > > > > > - shall activate a diagnostic trouble code with SPN = 2000+SA and FMI =
> > > > > > > > > > > 31
> > > > > > > > > > > 
> > > > > > > > > > > After the address-claim message is sent by CF #1, as per ISO11783-5
> > > > > > > > > > > standard:
> > > > > > > > > > > - If the name of the CF #1 has a lower priority then the one of the CF
> > > > > > > > > > > #2, the the CF #2 shall send its address-claim message and thus the CF
> > > > > > > > > > > #1 shall send the cannot-claim-address message or shall execute again
> > > > > > > > > > > the claim procedure with a new address
> > > > > > > > > > > - If the name of the CF #1 has higher priority then the of the CF #2,
> > > > > > > > > > > then the CF #2 shall send the cannot-claim-address message or shall
> > > > > > > > > > > execute the claim procedure with a new address
> > > > > > > > > > > 
> > > > > > > > > > > Above conflict management is OK with current J1939 driver
> > > > > > > > > > > implementation, however, since the driver always waits 250ms after
> > > > > > > > > > > sending an address-claim message, the CF #1 cannot set the DTC. The DM1
> > > > > > > > > > > message which is expected to be sent each second (as per J1939-73
> > > > > > > > > > > standard) may not be sent.
> > > > > > > > > > > 
> > > > > > > > > > > Honestly, I don't know which company is doing the ISOBUS compliance
> > > > > > > > > > > tests on our products and which tool they use as it was choosen by our
> > > > > > > > > > > customer, however they did send us some CAN traces of previously
> > > > > > > > > > > performed tests and we noticed that the DM1 message is sent 160ms after
> > > > > > > > > > > the address-claim message (but it may also be lower then that), and this
> > > > > > > > > > > is something that we cannot do because the driver blocks the application
> > > > > > > > > > > from sending it.
> > > > > > > > > > > 
> > > > > > > > > > > 28401.127146 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > > > > > > > with other CF's address
> > > > > > > > > > > 28401.167414 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > > > > > > > Claim - SA = F0
> > > > > > > > > > > 28401.349214 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 01 FF FF  //DM1
> > > > > > > > > > > 28402.155774 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > > > > > > > with other CF's address
> > > > > > > > > > > 28402.169455 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > > > > > > > Claim - SA = F0
> > > > > > > > > > > 28402.348226 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 02 FF FF  //DM1
> > > > > > > > > > > 28403.182753 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > > > > > > > with other CF's address
> > > > > > > > > > > 28403.188648 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > > > > > > > Claim - SA = F0
> > > > > > > > > > > 28403.349328 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > > > > > > > 28404.349406 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > > > > > > > 28405.349740 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > > > > > > > 
> > > > > > > > > > > Since the 250ms wait is not explicitly stated, IMHO it should be up to
> > > > > > > > > > > the user-space implementation to decide how to manage it.    
> > > > > > > > > 
> > > > > > > > > I think this is not entirely correct. AFAICS the 250ms wait is indeed
> > > > > > > > > explicitly stated.
> > > > > > > > > The following is taken from ISO 11783-5:
> > > > > > > > > 
> > > > > > > > > In "4.4.4.3 Address violation" it states that "If a CF receives a message,
> > > > > > > > > other than the address-claimed message, which uses the CF’s own SA, then the
> > > > > > > > > CF [...] shall send the address-claim message to the Global address."
> > > > > > > > > 
> > > > > > > > > So the CF shall claim its address again. But further down, in "4.5.2 Address
> > > > > > > > > claim requirements" it is stated that "...No CF shall begin, or resume,
> > > > > > > > > transmission on the network until 250 ms after it has successfully claimed an
> > > > > > > > > address".
> > > > > > > > > 
> > > > > > > > > At this moment, the address is in dispute. The affected CFs are not allowed to
> > > > > > > > > send any other messages until this dispute is resolved, and the standard
> > > > > > > > > requires a waiting time of 250ms which is minimally deemed necessary to give
> > > > > > > > > all participants time to respond and eventually dispute the address claim.
> > > > > > > > > 
> > > > > > > > > If the offending CF ignores this dispute and keeps sending incorrect messages
> > > > > > > > > faster than every 250ms, then effectively the other CF has no chance to ever
> > > > > > > > > resume normal operation because its address is still disputed.
> > > > > > > > > 
> > > > > > > > > According to 4.4.4.3 it is also required to set a DTC, but it will not be
> > > > > > > > > allowed to send the DM1 message unless the address dispute is resolved.
> > > > > > > > > 
> > > > > > > > > This effectively leads to the offending CF to DoS the affected CF if it keeps
> > > > > > > > > sending offending messages. Unfortunately neither J1939 nor ISObus takes into
> > > > > > > > > account adversarial behavior on the CAN network, so we cannot do anything
> > > > > > > > > about this.
> > > > > > > > > 
> > > > > > > > > As for the ISObus compliance tool that is mentioned by Devid, IMHO this
> > > > > > > > > compliance tool should be challenged and fixed, since it is broken.
> > > > > > > > > 
> > > > > > > > > The networking layer is prohibiting the DM1 message to be sent, and the
> > > > > > > > > networking layer has precedence above all superior protocol layers, so the
> > > > > > > > > diagnostics layer is not able to operate at this moment.
> > > > > > > > > 
> > > > > > > > > Best regards,
> > > > > > > > > 
> > > > > > > > >     
> > > > > > > > 
> > > > > > > > Hi David,
> > > > > > > > 
> > > > > > > > I get your point but I'm not sure that it is the correct interpretation
> > > > > > > > that should be applied in this particular case for the following
> > > > > > > > reasons:
> > > > > > > > 
> > > > > > > > - In "4.5.2 Address claim requirements" it is explicitly stated that
> > > > > > > > "The CF shall claim its own address when initializing and when
> > > > > > > > responding to a command to change its NAME or address" and this seems to  
> > > > > > > 
> > > > > > > The standard unfortunately has a track record of ignoring a lot of scenarios
> > > > > > > and corner cases, like in this instance the fact that there can appear new
> > > > > > > participants on the bus _after_ initialization has long finished, and it would
> > > > > > > need to claim its address again in that case.
> > > > > > > 
> > > > > > > But look at point d) of that same section: "No CF shall begin, or resume,
> > > > > > > transmission on the network until 250 ms after it has successfully claimed an
> > > > > > > address (Figure 4). This does not apply when responding to a request for
> > > > > > > address claimed."
> > > > > > > 
> > > > > > > So we basically have two situations when this will apply after the network is
> > > > > > > up and running and a new node suddenly appears:
> > > > > > > 
> > > > > > >  1. The new node starts with a "Request for address claimed" message, to
> > > > > > >  which your CF should respond with an "Address Claimed" message and NOT wait
> > > > > > >  250ms.
> > > > > > > 
> > > > > > > or
> > > > > > > 
> > > > > > >  2. The new node creates an addressing conflict either by claiming its address
> > > > > > >  without first sending a "request for address claimed" message or (and this is
> > > > > > >  your case) simply using its address without claiming it first.
> > > > > > > 
> > > > > > > It is this second possibility where there is a conflict that must be resolved,
> > > > > > > and then you must wait 250ms after claiming the conflicting address for
> > > > > > > yourself.
> > > > > > >   
> > > > > > > > completely ignore the "4.4.4.3 Address violation" that states that the
> > > > > > > > address-claimed message shall be sent also when "the CF receives a
> > > > > > > > message, other than the address-claimed message, which uses the CF's own
> > > > > > > > SA".
> > > > > > > > Please note that the address was already claimed by the CF, so I think
> > > > > > > > that the initialization requirements should not apply in this case since
> > > > > > > > all disputes were already resolved.  
> > > > > > > 
> > > > > > > Well, yes and no. The address was claimed before, yes, but then a new node came
> > > > > > > onto the bus and disputed that address. In that case the dispute needs to be
> > > > > > > resolved first. Imagine you would NOT wait 250ms, but the other CF did
> > > > > > > correctly claim its address, but it was you who did not receive that message
> > > > > > > for some reason. Now also assume that your own NAME has a lower priority than
> > > > > > > the other CF. In this case you can send a "claimed address" message to claim
> > > > > > > your address again, but it will be contested. If you don't wait for the
> > > > > > > contestant, it is you who will be in violation of the protocol, because you
> > > > > > > should have changed your own address but failed to do so.
> > > > > > >   
> > > > > > > > - If the offending CF ignores the dispute, as you said, then the other
> > > > > > > > CF has no chance to ever resume normal operation and so the network
> > > > > > > > cannot be aware that the other CF is not working correctly because the
> > > > > > > > offending CF is spoofing its own address.  
> > > > > > > 
> > > > > > > Correct. And like I said in my previous reply, this is unfortunately how CAN,
> > > > > > > J1939 and ISObus work. The whole network must cooperate and there is no
> > > > > > > consideration for malign or adversarial actors.
> > > > > > > There are also a lot of possible corner cases that these standards
> > > > > > > unfortunately do not take into account. Conformance test tools seem to be even
> > > > > > > more problematic and tend to have bugs quite often. I am still inclined to
> > > > > > > think this is the case with your test tool.
> > > > > > >   
> > > > > > > > This seems to make useless the
> > > > > > > > requirement that states to activate the DTC in "4.4.4.3 Address
> > > > > > > > violation".  
> > > > > > > 
> > > > > > > The requirement is not useless. You can still set and store the DTC, just not
> > > > > > > broadcast it to the network at that moment.
> > > > > > > 
> > > > > > > Best regards,
> > > > > > > 
> > > > > > >   
> > > > > > 
> > > > > > Thank you for your feedback and explanation.
> > > > > > I asked the customer to contact the compliance company so that we can
> > > > > > verify with them this particular use-case. I want to understand if there
> > > > > > is an application note or exception that states how to manage it or if
> > > > > > they implemented the test basing it on their own interpretation and how
> > > > > > it really works: supposing that the test does not check the DM1
> > > > > > presence, then the test could be passed even without sending the DM1
> > > > > > message during the 250ms after the adress-claimed message.
> > > > > > 
> > > > > > Best regards,
> > > > > > Devid  
> > > > > 
> > > > > Hi David, all,
> > > > > 
> > > > > I'm sorry for resuming this discussion after a long time but I noticed
> > > > > that the driver forces the 250 ms wait even when responding to a request
> > > > > for address-claimed which is against point d) of ISO 11783-5 "4.5.2
> > > > > Address claim requirements":
> > > > > 
> > > > > No CF shall begin, or resume, transmission on the network until 250 ms
> > > > > after it has successfully claimed  an  address  (see Figure 4), except
> > > > > when responding to a request for address-claimed.
> > > > > 
> > > > > IMHO the driver shall be able to detect above condition or shall not
> > > > > force the 250 ms wait which should then be implemented, depending on the
> > > > > case, on user-space application side.
> > > > 
> > > > I am a bit out of the loop with this driver, but I think what you say is
> > > > correct. The J1939 stack should NOT unconditionally stay silent for 250ms
> > > > after sending an Address Claimed message. It should specifically NOT do so if
> > > > it is just responding to a Request for Address Claimed message.
> > > > 
> > > > So if it is indeed so, that the J1939 stack will hold off sending messages
> > > > forcibly after sending an Address Claimed message as a reply to a Request for
> > > > Address Claimed, then I'd say this is a bug.
> > > > 
> > > > @Oleksij, can you confirm this?
> > > 
> > > I do not see any code path inside of the j1939 stack preventing sending
> > > you anything by address. The only part which cares about address
> > > claiming is net/can/j1939/address-claim.c and it will just not be able
> > > to resolve name to address, because address claiming was not finished
> > > jet. With other words, if you need to send responding to a request for
> > > address-claimed, then just send it by using address instead of name.
> > > 
> > > Regards,
> > > Oleksij
> > 
> > Hi Oleksij,
> > I'm sorry but I think I don't understand your proposal.
> > 
> > If I send an address-claimed message binding the socket without the name
> > (can_addr.j1939.name = J1939_NO_NAME), then the driver returns error
> > EPROTO.
> > If I send the address-claimed message binding the socket with the name,
> > then the address-claimed message is sent successfully but other messages
> > sent within 250 ms are not sent (error EADDRNOTAVAIL).
> 
> What kind of other messages are your trying to send?
> 
> Regards,
> Oleksij

Hi,
the application sends each second the DM1 (0xFECA), meanwhile it
receives an request for address-claimed message and it answers with the
address-claimed message.
If the DM1 is sent within 250 ms after the address-claimed message, then
it is rejected with error EADDRNOTAVAIL.
Since the driver is performing the claim each time the address-claimed
message is sent (even if it is a response to a request for address-
claimed), the EADDRNOTAVAIL error is expected in the 250 ms time window.
So, when a request for address-claimed message is received:
- You cannot send an address-claimed message with the socket bound with
J1939_NO_NAME because it is rejected with error EPROTO
- You can send an address-claimed message with the socket bound with the
name but you won't be able to send other messages within 250 ms because
they are rejected with error EADDRNOTAVAIL and this is against point d)
of ISO 11783-5 "4.5.2 Address claim requirements".

Best Regards,
Devid




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RESEND] can: j1939: do not wait 250ms if the same addr was already claimed
  2022-11-18 12:41                           ` Devid Antonio Filoni
@ 2022-11-18 13:44                             ` Oleksij Rempel
  2022-11-18 15:12                               ` Devid Antonio Filoni
  0 siblings, 1 reply; 28+ messages in thread
From: Oleksij Rempel @ 2022-11-18 13:44 UTC (permalink / raw)
  To: Devid Antonio Filoni
  Cc: Oliver Hartkopp, Kurt Van Dijck, kbuild test robot, Maxime Jayat,
	Robin van der Gracht, linux-kernel, Oleksij Rempel, Paolo Abeni,
	Marc Kleine-Budde, kernel, David Jander, Jakub Kicinski, netdev,
	linux-can, David S. Miller

On Fri, Nov 18, 2022 at 01:41:05PM +0100, Devid Antonio Filoni wrote:
> On Fri, 2022-11-18 at 13:30 +0100, Oleksij Rempel wrote:
> > On Fri, Nov 18, 2022 at 11:25:04AM +0100, Devid Antonio Filoni wrote:
> > > On Fri, 2022-11-18 at 07:06 +0100, Oleksij Rempel wrote:
> > > > On Thu, Nov 17, 2022 at 04:22:51PM +0100, David Jander wrote:
> > > > > On Thu, 17 Nov 2022 15:08:20 +0100
> > > > > Devid Antonio Filoni <devid.filoni@egluetechnologies.com> wrote:
> > > > > 
> > > > > > On Fri, 2022-05-13 at 11:46 +0200, Devid Antonio Filoni wrote:
> > > > > > > Hi David,
> > > > > > > 
> > > > > > > On Wed, 2022-05-11 at 16:22 +0200, David Jander wrote:  
> > > > > > > > Hi Devid,
> > > > > > > > 
> > > > > > > > On Wed, 11 May 2022 14:55:04 +0200
> > > > > > > > Devid Antonio Filoni <
> > > > > > > > devid.filoni@egluetechnologies.com  
> > > > > > > > > wrote:  
> > > > > > > >   
> > > > > > > > > On Wed, 2022-05-11 at 11:06 +0200, David Jander wrote:  
> > > > > > > > > > Hi,
> > > > > > > > > > 
> > > > > > > > > > On Wed, 11 May 2022 10:47:28 +0200
> > > > > > > > > > Oleksij Rempel <
> > > > > > > > > > o.rempel@pengutronix.de
> > > > > > > > > >     
> > > > > > > > > > > wrote:    
> > > > > > > > > > 
> > > > > > > > > >     
> > > > > > > > > > > Hi,
> > > > > > > > > > > 
> > > > > > > > > > > i'll CC more J1939 users to the discussion.    
> > > > > > > > > > 
> > > > > > > > > > Thanks for the CC.
> > > > > > > > > >     
> > > > > > > > > > > On Tue, May 10, 2022 at 01:00:41PM +0200, Devid Antonio Filoni wrote:    
> > > > > > > > > > > > Hi,
> > > > > > > > > > > > 
> > > > > > > > > > > > On Tue, 2022-05-10 at 06:26 +0200, Oleksij Rempel wrote:      
> > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > 
> > > > > > > > > > > > > On Mon, May 09, 2022 at 09:04:06PM +0200, Kurt Van Dijck wrote:      
> > > > > > > > > > > > > > On ma, 09 mei 2022 19:03:03 +0200, Devid Antonio Filoni wrote:      
> > > > > > > > > > > > > > > This is not explicitly stated in SAE J1939-21 and some tools used for
> > > > > > > > > > > > > > > ISO-11783 certification do not expect this wait.      
> > > > > > > > > > > > > 
> > > > > > > > > > > > > It will be interesting to know which certification tool do not expect it and
> > > > > > > > > > > > > what explanation is used if it fails?
> > > > > > > > > > > > >       
> > > > > > > > > > > > > > IMHO, the current behaviour is not explicitely stated, but nor is the opposite.
> > > > > > > > > > > > > > And if I'm not mistaken, this introduces a 250msec delay.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 1. If you want to avoid the 250msec gap, you should avoid to contest the same address.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 2. It's a balance between predictability and flexibility, but if you try to accomplish both,
> > > > > > > > > > > > > > as your patch suggests, there is slight time-window until the current owner responds,
> > > > > > > > > > > > > > in which it may be confusing which node has the address. It depends on how much history
> > > > > > > > > > > > > > you have collected on the bus.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > I'm sure that this problem decreases with increasing processing power on the nodes,
> > > > > > > > > > > > > > but bigger internal queues also increase this window.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > It would certainly help if you describe how the current implementation fails.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Would decreasing the dead time to 50msec help in such case.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Kind regards,
> > > > > > > > > > > > > > Kurt
> > > > > > > > > > > > > >       
> > > > > > > > > > > > > 
> > > > > > > > > > > > >       
> > > > > > > > > > > > 
> > > > > > > > > > > > The test that is being executed during the ISOBUS compliance is the
> > > > > > > > > > > > following: after an address has been claimed by a CF (#1), another CF
> > > > > > > > > > > > (#2) sends a  message (other than address-claim) using the same address
> > > > > > > > > > > > claimed by CF #1.
> > > > > > > > > > > > 
> > > > > > > > > > > > As per ISO11783-5 standard, if a CF receives a message, other than the
> > > > > > > > > > > > address-claimed message, which uses the CF's own SA, then the CF (#1):
> > > > > > > > > > > > - shall send the address-claim message to the Global address;
> > > > > > > > > > > > - shall activate a diagnostic trouble code with SPN = 2000+SA and FMI =
> > > > > > > > > > > > 31
> > > > > > > > > > > > 
> > > > > > > > > > > > After the address-claim message is sent by CF #1, as per ISO11783-5
> > > > > > > > > > > > standard:
> > > > > > > > > > > > - If the name of the CF #1 has a lower priority then the one of the CF
> > > > > > > > > > > > #2, the the CF #2 shall send its address-claim message and thus the CF
> > > > > > > > > > > > #1 shall send the cannot-claim-address message or shall execute again
> > > > > > > > > > > > the claim procedure with a new address
> > > > > > > > > > > > - If the name of the CF #1 has higher priority then the of the CF #2,
> > > > > > > > > > > > then the CF #2 shall send the cannot-claim-address message or shall
> > > > > > > > > > > > execute the claim procedure with a new address
> > > > > > > > > > > > 
> > > > > > > > > > > > Above conflict management is OK with current J1939 driver
> > > > > > > > > > > > implementation, however, since the driver always waits 250ms after
> > > > > > > > > > > > sending an address-claim message, the CF #1 cannot set the DTC. The DM1
> > > > > > > > > > > > message which is expected to be sent each second (as per J1939-73
> > > > > > > > > > > > standard) may not be sent.
> > > > > > > > > > > > 
> > > > > > > > > > > > Honestly, I don't know which company is doing the ISOBUS compliance
> > > > > > > > > > > > tests on our products and which tool they use as it was choosen by our
> > > > > > > > > > > > customer, however they did send us some CAN traces of previously
> > > > > > > > > > > > performed tests and we noticed that the DM1 message is sent 160ms after
> > > > > > > > > > > > the address-claim message (but it may also be lower then that), and this
> > > > > > > > > > > > is something that we cannot do because the driver blocks the application
> > > > > > > > > > > > from sending it.
> > > > > > > > > > > > 
> > > > > > > > > > > > 28401.127146 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > > > > > > > > with other CF's address
> > > > > > > > > > > > 28401.167414 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > > > > > > > > Claim - SA = F0
> > > > > > > > > > > > 28401.349214 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 01 FF FF  //DM1
> > > > > > > > > > > > 28402.155774 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > > > > > > > > with other CF's address
> > > > > > > > > > > > 28402.169455 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > > > > > > > > Claim - SA = F0
> > > > > > > > > > > > 28402.348226 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 02 FF FF  //DM1
> > > > > > > > > > > > 28403.182753 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > > > > > > > > with other CF's address
> > > > > > > > > > > > 28403.188648 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > > > > > > > > Claim - SA = F0
> > > > > > > > > > > > 28403.349328 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > > > > > > > > 28404.349406 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > > > > > > > > 28405.349740 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > > > > > > > > 
> > > > > > > > > > > > Since the 250ms wait is not explicitly stated, IMHO it should be up to
> > > > > > > > > > > > the user-space implementation to decide how to manage it.    
> > > > > > > > > > 
> > > > > > > > > > I think this is not entirely correct. AFAICS the 250ms wait is indeed
> > > > > > > > > > explicitly stated.
> > > > > > > > > > The following is taken from ISO 11783-5:
> > > > > > > > > > 
> > > > > > > > > > In "4.4.4.3 Address violation" it states that "If a CF receives a message,
> > > > > > > > > > other than the address-claimed message, which uses the CF’s own SA, then the
> > > > > > > > > > CF [...] shall send the address-claim message to the Global address."
> > > > > > > > > > 
> > > > > > > > > > So the CF shall claim its address again. But further down, in "4.5.2 Address
> > > > > > > > > > claim requirements" it is stated that "...No CF shall begin, or resume,
> > > > > > > > > > transmission on the network until 250 ms after it has successfully claimed an
> > > > > > > > > > address".
> > > > > > > > > > 
> > > > > > > > > > At this moment, the address is in dispute. The affected CFs are not allowed to
> > > > > > > > > > send any other messages until this dispute is resolved, and the standard
> > > > > > > > > > requires a waiting time of 250ms which is minimally deemed necessary to give
> > > > > > > > > > all participants time to respond and eventually dispute the address claim.
> > > > > > > > > > 
> > > > > > > > > > If the offending CF ignores this dispute and keeps sending incorrect messages
> > > > > > > > > > faster than every 250ms, then effectively the other CF has no chance to ever
> > > > > > > > > > resume normal operation because its address is still disputed.
> > > > > > > > > > 
> > > > > > > > > > According to 4.4.4.3 it is also required to set a DTC, but it will not be
> > > > > > > > > > allowed to send the DM1 message unless the address dispute is resolved.
> > > > > > > > > > 
> > > > > > > > > > This effectively leads to the offending CF to DoS the affected CF if it keeps
> > > > > > > > > > sending offending messages. Unfortunately neither J1939 nor ISObus takes into
> > > > > > > > > > account adversarial behavior on the CAN network, so we cannot do anything
> > > > > > > > > > about this.
> > > > > > > > > > 
> > > > > > > > > > As for the ISObus compliance tool that is mentioned by Devid, IMHO this
> > > > > > > > > > compliance tool should be challenged and fixed, since it is broken.
> > > > > > > > > > 
> > > > > > > > > > The networking layer is prohibiting the DM1 message to be sent, and the
> > > > > > > > > > networking layer has precedence above all superior protocol layers, so the
> > > > > > > > > > diagnostics layer is not able to operate at this moment.
> > > > > > > > > > 
> > > > > > > > > > Best regards,
> > > > > > > > > > 
> > > > > > > > > >     
> > > > > > > > > 
> > > > > > > > > Hi David,
> > > > > > > > > 
> > > > > > > > > I get your point but I'm not sure that it is the correct interpretation
> > > > > > > > > that should be applied in this particular case for the following
> > > > > > > > > reasons:
> > > > > > > > > 
> > > > > > > > > - In "4.5.2 Address claim requirements" it is explicitly stated that
> > > > > > > > > "The CF shall claim its own address when initializing and when
> > > > > > > > > responding to a command to change its NAME or address" and this seems to  
> > > > > > > > 
> > > > > > > > The standard unfortunately has a track record of ignoring a lot of scenarios
> > > > > > > > and corner cases, like in this instance the fact that there can appear new
> > > > > > > > participants on the bus _after_ initialization has long finished, and it would
> > > > > > > > need to claim its address again in that case.
> > > > > > > > 
> > > > > > > > But look at point d) of that same section: "No CF shall begin, or resume,
> > > > > > > > transmission on the network until 250 ms after it has successfully claimed an
> > > > > > > > address (Figure 4). This does not apply when responding to a request for
> > > > > > > > address claimed."
> > > > > > > > 
> > > > > > > > So we basically have two situations when this will apply after the network is
> > > > > > > > up and running and a new node suddenly appears:
> > > > > > > > 
> > > > > > > >  1. The new node starts with a "Request for address claimed" message, to
> > > > > > > >  which your CF should respond with an "Address Claimed" message and NOT wait
> > > > > > > >  250ms.
> > > > > > > > 
> > > > > > > > or
> > > > > > > > 
> > > > > > > >  2. The new node creates an addressing conflict either by claiming its address
> > > > > > > >  without first sending a "request for address claimed" message or (and this is
> > > > > > > >  your case) simply using its address without claiming it first.
> > > > > > > > 
> > > > > > > > It is this second possibility where there is a conflict that must be resolved,
> > > > > > > > and then you must wait 250ms after claiming the conflicting address for
> > > > > > > > yourself.
> > > > > > > >   
> > > > > > > > > completely ignore the "4.4.4.3 Address violation" that states that the
> > > > > > > > > address-claimed message shall be sent also when "the CF receives a
> > > > > > > > > message, other than the address-claimed message, which uses the CF's own
> > > > > > > > > SA".
> > > > > > > > > Please note that the address was already claimed by the CF, so I think
> > > > > > > > > that the initialization requirements should not apply in this case since
> > > > > > > > > all disputes were already resolved.  
> > > > > > > > 
> > > > > > > > Well, yes and no. The address was claimed before, yes, but then a new node came
> > > > > > > > onto the bus and disputed that address. In that case the dispute needs to be
> > > > > > > > resolved first. Imagine you would NOT wait 250ms, but the other CF did
> > > > > > > > correctly claim its address, but it was you who did not receive that message
> > > > > > > > for some reason. Now also assume that your own NAME has a lower priority than
> > > > > > > > the other CF. In this case you can send a "claimed address" message to claim
> > > > > > > > your address again, but it will be contested. If you don't wait for the
> > > > > > > > contestant, it is you who will be in violation of the protocol, because you
> > > > > > > > should have changed your own address but failed to do so.
> > > > > > > >   
> > > > > > > > > - If the offending CF ignores the dispute, as you said, then the other
> > > > > > > > > CF has no chance to ever resume normal operation and so the network
> > > > > > > > > cannot be aware that the other CF is not working correctly because the
> > > > > > > > > offending CF is spoofing its own address.  
> > > > > > > > 
> > > > > > > > Correct. And like I said in my previous reply, this is unfortunately how CAN,
> > > > > > > > J1939 and ISObus work. The whole network must cooperate and there is no
> > > > > > > > consideration for malign or adversarial actors.
> > > > > > > > There are also a lot of possible corner cases that these standards
> > > > > > > > unfortunately do not take into account. Conformance test tools seem to be even
> > > > > > > > more problematic and tend to have bugs quite often. I am still inclined to
> > > > > > > > think this is the case with your test tool.
> > > > > > > >   
> > > > > > > > > This seems to make useless the
> > > > > > > > > requirement that states to activate the DTC in "4.4.4.3 Address
> > > > > > > > > violation".  
> > > > > > > > 
> > > > > > > > The requirement is not useless. You can still set and store the DTC, just not
> > > > > > > > broadcast it to the network at that moment.
> > > > > > > > 
> > > > > > > > Best regards,
> > > > > > > > 
> > > > > > > >   
> > > > > > > 
> > > > > > > Thank you for your feedback and explanation.
> > > > > > > I asked the customer to contact the compliance company so that we can
> > > > > > > verify with them this particular use-case. I want to understand if there
> > > > > > > is an application note or exception that states how to manage it or if
> > > > > > > they implemented the test basing it on their own interpretation and how
> > > > > > > it really works: supposing that the test does not check the DM1
> > > > > > > presence, then the test could be passed even without sending the DM1
> > > > > > > message during the 250ms after the adress-claimed message.
> > > > > > > 
> > > > > > > Best regards,
> > > > > > > Devid  
> > > > > > 
> > > > > > Hi David, all,
> > > > > > 
> > > > > > I'm sorry for resuming this discussion after a long time but I noticed
> > > > > > that the driver forces the 250 ms wait even when responding to a request
> > > > > > for address-claimed which is against point d) of ISO 11783-5 "4.5.2
> > > > > > Address claim requirements":
> > > > > > 
> > > > > > No CF shall begin, or resume, transmission on the network until 250 ms
> > > > > > after it has successfully claimed  an  address  (see Figure 4), except
> > > > > > when responding to a request for address-claimed.
> > > > > > 
> > > > > > IMHO the driver shall be able to detect above condition or shall not
> > > > > > force the 250 ms wait which should then be implemented, depending on the
> > > > > > case, on user-space application side.
> > > > > 
> > > > > I am a bit out of the loop with this driver, but I think what you say is
> > > > > correct. The J1939 stack should NOT unconditionally stay silent for 250ms
> > > > > after sending an Address Claimed message. It should specifically NOT do so if
> > > > > it is just responding to a Request for Address Claimed message.
> > > > > 
> > > > > So if it is indeed so, that the J1939 stack will hold off sending messages
> > > > > forcibly after sending an Address Claimed message as a reply to a Request for
> > > > > Address Claimed, then I'd say this is a bug.
> > > > > 
> > > > > @Oleksij, can you confirm this?
> > > > 
> > > > I do not see any code path inside of the j1939 stack preventing sending
> > > > you anything by address. The only part which cares about address
> > > > claiming is net/can/j1939/address-claim.c and it will just not be able
> > > > to resolve name to address, because address claiming was not finished
> > > > jet. With other words, if you need to send responding to a request for
> > > > address-claimed, then just send it by using address instead of name.
> > > > 
> > > > Regards,
> > > > Oleksij
> > > 
> > > Hi Oleksij,
> > > I'm sorry but I think I don't understand your proposal.
> > > 
> > > If I send an address-claimed message binding the socket without the name
> > > (can_addr.j1939.name = J1939_NO_NAME), then the driver returns error
> > > EPROTO.
> > > If I send the address-claimed message binding the socket with the name,
> > > then the address-claimed message is sent successfully but other messages
> > > sent within 250 ms are not sent (error EADDRNOTAVAIL).
> > 
> > What kind of other messages are your trying to send?
> > 
> > Regards,
> > Oleksij
> 
> Hi,
> the application sends each second the DM1 (0xFECA), meanwhile it
> receives an request for address-claimed message and it answers with the
> address-claimed message.
> If the DM1 is sent within 250 ms after the address-claimed message, then
> it is rejected with error EADDRNOTAVAIL.
> Since the driver is performing the claim each time the address-claimed
> message is sent (even if it is a response to a request for address-
> claimed), the EADDRNOTAVAIL error is expected in the 250 ms time window.
> So, when a request for address-claimed message is received:
> - You cannot send an address-claimed message with the socket bound with
> J1939_NO_NAME because it is rejected with error EPROTO
> - You can send an address-claimed message with the socket bound with the
> name but you won't be able to send other messages within 250 ms because
> they are rejected with error EADDRNOTAVAIL and this is against point d)
> of ISO 11783-5 "4.5.2 Address claim requirements".

Ok, finally I understood it.

If I see it correctly, it is hard to fix second part of "ISO 11783-5
 4.5.2 d)" without breaking first part of the same point.

Haw can I see the difference between AC and AC send as response for RfAC?
Wait 250ms? What if some system starts just in this time and will send
plain AC?

Regards,
Oleksij
-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RESEND] can: j1939: do not wait 250ms if the same addr was already claimed
  2022-11-18 13:44                             ` Oleksij Rempel
@ 2022-11-18 15:12                               ` Devid Antonio Filoni
  2022-11-19 10:12                                 ` Oleksij Rempel
  0 siblings, 1 reply; 28+ messages in thread
From: Devid Antonio Filoni @ 2022-11-18 15:12 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Oliver Hartkopp, Kurt Van Dijck, kbuild test robot, Maxime Jayat,
	Robin van der Gracht, linux-kernel, Oleksij Rempel, Paolo Abeni,
	Marc Kleine-Budde, kernel, David Jander, Jakub Kicinski, netdev,
	linux-can, David S. Miller

On Fri, 2022-11-18 at 14:44 +0100, Oleksij Rempel wrote:
> On Fri, Nov 18, 2022 at 01:41:05PM +0100, Devid Antonio Filoni wrote:
> > On Fri, 2022-11-18 at 13:30 +0100, Oleksij Rempel wrote:
> > > On Fri, Nov 18, 2022 at 11:25:04AM +0100, Devid Antonio Filoni wrote:
> > > > On Fri, 2022-11-18 at 07:06 +0100, Oleksij Rempel wrote:
> > > > > On Thu, Nov 17, 2022 at 04:22:51PM +0100, David Jander wrote:
> > > > > > On Thu, 17 Nov 2022 15:08:20 +0100
> > > > > > Devid Antonio Filoni <devid.filoni@egluetechnologies.com> wrote:
> > > > > > 
> > > > > > > On Fri, 2022-05-13 at 11:46 +0200, Devid Antonio Filoni wrote:
> > > > > > > > Hi David,
> > > > > > > > 
> > > > > > > > On Wed, 2022-05-11 at 16:22 +0200, David Jander wrote:  
> > > > > > > > > Hi Devid,
> > > > > > > > > 
> > > > > > > > > On Wed, 11 May 2022 14:55:04 +0200
> > > > > > > > > Devid Antonio Filoni <
> > > > > > > > > devid.filoni@egluetechnologies.com  
> > > > > > > > > > wrote:  
> > > > > > > > >   
> > > > > > > > > > On Wed, 2022-05-11 at 11:06 +0200, David Jander wrote:  
> > > > > > > > > > > Hi,
> > > > > > > > > > > 
> > > > > > > > > > > On Wed, 11 May 2022 10:47:28 +0200
> > > > > > > > > > > Oleksij Rempel <
> > > > > > > > > > > o.rempel@pengutronix.de
> > > > > > > > > > >     
> > > > > > > > > > > > wrote:    
> > > > > > > > > > > 
> > > > > > > > > > >     
> > > > > > > > > > > > Hi,
> > > > > > > > > > > > 
> > > > > > > > > > > > i'll CC more J1939 users to the discussion.    
> > > > > > > > > > > 
> > > > > > > > > > > Thanks for the CC.
> > > > > > > > > > >     
> > > > > > > > > > > > On Tue, May 10, 2022 at 01:00:41PM +0200, Devid Antonio Filoni wrote:    
> > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > 
> > > > > > > > > > > > > On Tue, 2022-05-10 at 06:26 +0200, Oleksij Rempel wrote:      
> > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > On Mon, May 09, 2022 at 09:04:06PM +0200, Kurt Van Dijck wrote:      
> > > > > > > > > > > > > > > On ma, 09 mei 2022 19:03:03 +0200, Devid Antonio Filoni wrote:      
> > > > > > > > > > > > > > > > This is not explicitly stated in SAE J1939-21 and some tools used for
> > > > > > > > > > > > > > > > ISO-11783 certification do not expect this wait.      
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > It will be interesting to know which certification tool do not expect it and
> > > > > > > > > > > > > > what explanation is used if it fails?
> > > > > > > > > > > > > >       
> > > > > > > > > > > > > > > IMHO, the current behaviour is not explicitely stated, but nor is the opposite.
> > > > > > > > > > > > > > > And if I'm not mistaken, this introduces a 250msec delay.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 1. If you want to avoid the 250msec gap, you should avoid to contest the same address.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 2. It's a balance between predictability and flexibility, but if you try to accomplish both,
> > > > > > > > > > > > > > > as your patch suggests, there is slight time-window until the current owner responds,
> > > > > > > > > > > > > > > in which it may be confusing which node has the address. It depends on how much history
> > > > > > > > > > > > > > > you have collected on the bus.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > I'm sure that this problem decreases with increasing processing power on the nodes,
> > > > > > > > > > > > > > > but bigger internal queues also increase this window.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > It would certainly help if you describe how the current implementation fails.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Would decreasing the dead time to 50msec help in such case.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Kind regards,
> > > > > > > > > > > > > > > Kurt
> > > > > > > > > > > > > > >       
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > >       
> > > > > > > > > > > > > 
> > > > > > > > > > > > > The test that is being executed during the ISOBUS compliance is the
> > > > > > > > > > > > > following: after an address has been claimed by a CF (#1), another CF
> > > > > > > > > > > > > (#2) sends a  message (other than address-claim) using the same address
> > > > > > > > > > > > > claimed by CF #1.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > As per ISO11783-5 standard, if a CF receives a message, other than the
> > > > > > > > > > > > > address-claimed message, which uses the CF's own SA, then the CF (#1):
> > > > > > > > > > > > > - shall send the address-claim message to the Global address;
> > > > > > > > > > > > > - shall activate a diagnostic trouble code with SPN = 2000+SA and FMI =
> > > > > > > > > > > > > 31
> > > > > > > > > > > > > 
> > > > > > > > > > > > > After the address-claim message is sent by CF #1, as per ISO11783-5
> > > > > > > > > > > > > standard:
> > > > > > > > > > > > > - If the name of the CF #1 has a lower priority then the one of the CF
> > > > > > > > > > > > > #2, the the CF #2 shall send its address-claim message and thus the CF
> > > > > > > > > > > > > #1 shall send the cannot-claim-address message or shall execute again
> > > > > > > > > > > > > the claim procedure with a new address
> > > > > > > > > > > > > - If the name of the CF #1 has higher priority then the of the CF #2,
> > > > > > > > > > > > > then the CF #2 shall send the cannot-claim-address message or shall
> > > > > > > > > > > > > execute the claim procedure with a new address
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Above conflict management is OK with current J1939 driver
> > > > > > > > > > > > > implementation, however, since the driver always waits 250ms after
> > > > > > > > > > > > > sending an address-claim message, the CF #1 cannot set the DTC. The DM1
> > > > > > > > > > > > > message which is expected to be sent each second (as per J1939-73
> > > > > > > > > > > > > standard) may not be sent.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Honestly, I don't know which company is doing the ISOBUS compliance
> > > > > > > > > > > > > tests on our products and which tool they use as it was choosen by our
> > > > > > > > > > > > > customer, however they did send us some CAN traces of previously
> > > > > > > > > > > > > performed tests and we noticed that the DM1 message is sent 160ms after
> > > > > > > > > > > > > the address-claim message (but it may also be lower then that), and this
> > > > > > > > > > > > > is something that we cannot do because the driver blocks the application
> > > > > > > > > > > > > from sending it.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 28401.127146 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > > > > > > > > > with other CF's address
> > > > > > > > > > > > > 28401.167414 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > > > > > > > > > Claim - SA = F0
> > > > > > > > > > > > > 28401.349214 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 01 FF FF  //DM1
> > > > > > > > > > > > > 28402.155774 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > > > > > > > > > with other CF's address
> > > > > > > > > > > > > 28402.169455 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > > > > > > > > > Claim - SA = F0
> > > > > > > > > > > > > 28402.348226 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 02 FF FF  //DM1
> > > > > > > > > > > > > 28403.182753 1  18E6FFF0x    Tx   d 8 FE 26 FF FF FF FF FF FF  //Message
> > > > > > > > > > > > > with other CF's address
> > > > > > > > > > > > > 28403.188648 1  18EEFFF0x    Rx   d 8 15 76 D1 0B 00 86 00 A0  //Address
> > > > > > > > > > > > > Claim - SA = F0
> > > > > > > > > > > > > 28403.349328 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > > > > > > > > > 28404.349406 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > > > > > > > > > 28405.349740 1  18FECAF0x    Rx   d 8 FF FF C0 08 1F 03 FF FF  //DM1
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Since the 250ms wait is not explicitly stated, IMHO it should be up to
> > > > > > > > > > > > > the user-space implementation to decide how to manage it.    
> > > > > > > > > > > 
> > > > > > > > > > > I think this is not entirely correct. AFAICS the 250ms wait is indeed
> > > > > > > > > > > explicitly stated.
> > > > > > > > > > > The following is taken from ISO 11783-5:
> > > > > > > > > > > 
> > > > > > > > > > > In "4.4.4.3 Address violation" it states that "If a CF receives a message,
> > > > > > > > > > > other than the address-claimed message, which uses the CF’s own SA, then the
> > > > > > > > > > > CF [...] shall send the address-claim message to the Global address."
> > > > > > > > > > > 
> > > > > > > > > > > So the CF shall claim its address again. But further down, in "4.5.2 Address
> > > > > > > > > > > claim requirements" it is stated that "...No CF shall begin, or resume,
> > > > > > > > > > > transmission on the network until 250 ms after it has successfully claimed an
> > > > > > > > > > > address".
> > > > > > > > > > > 
> > > > > > > > > > > At this moment, the address is in dispute. The affected CFs are not allowed to
> > > > > > > > > > > send any other messages until this dispute is resolved, and the standard
> > > > > > > > > > > requires a waiting time of 250ms which is minimally deemed necessary to give
> > > > > > > > > > > all participants time to respond and eventually dispute the address claim.
> > > > > > > > > > > 
> > > > > > > > > > > If the offending CF ignores this dispute and keeps sending incorrect messages
> > > > > > > > > > > faster than every 250ms, then effectively the other CF has no chance to ever
> > > > > > > > > > > resume normal operation because its address is still disputed.
> > > > > > > > > > > 
> > > > > > > > > > > According to 4.4.4.3 it is also required to set a DTC, but it will not be
> > > > > > > > > > > allowed to send the DM1 message unless the address dispute is resolved.
> > > > > > > > > > > 
> > > > > > > > > > > This effectively leads to the offending CF to DoS the affected CF if it keeps
> > > > > > > > > > > sending offending messages. Unfortunately neither J1939 nor ISObus takes into
> > > > > > > > > > > account adversarial behavior on the CAN network, so we cannot do anything
> > > > > > > > > > > about this.
> > > > > > > > > > > 
> > > > > > > > > > > As for the ISObus compliance tool that is mentioned by Devid, IMHO this
> > > > > > > > > > > compliance tool should be challenged and fixed, since it is broken.
> > > > > > > > > > > 
> > > > > > > > > > > The networking layer is prohibiting the DM1 message to be sent, and the
> > > > > > > > > > > networking layer has precedence above all superior protocol layers, so the
> > > > > > > > > > > diagnostics layer is not able to operate at this moment.
> > > > > > > > > > > 
> > > > > > > > > > > Best regards,
> > > > > > > > > > > 
> > > > > > > > > > >     
> > > > > > > > > > 
> > > > > > > > > > Hi David,
> > > > > > > > > > 
> > > > > > > > > > I get your point but I'm not sure that it is the correct interpretation
> > > > > > > > > > that should be applied in this particular case for the following
> > > > > > > > > > reasons:
> > > > > > > > > > 
> > > > > > > > > > - In "4.5.2 Address claim requirements" it is explicitly stated that
> > > > > > > > > > "The CF shall claim its own address when initializing and when
> > > > > > > > > > responding to a command to change its NAME or address" and this seems to  
> > > > > > > > > 
> > > > > > > > > The standard unfortunately has a track record of ignoring a lot of scenarios
> > > > > > > > > and corner cases, like in this instance the fact that there can appear new
> > > > > > > > > participants on the bus _after_ initialization has long finished, and it would
> > > > > > > > > need to claim its address again in that case.
> > > > > > > > > 
> > > > > > > > > But look at point d) of that same section: "No CF shall begin, or resume,
> > > > > > > > > transmission on the network until 250 ms after it has successfully claimed an
> > > > > > > > > address (Figure 4). This does not apply when responding to a request for
> > > > > > > > > address claimed."
> > > > > > > > > 
> > > > > > > > > So we basically have two situations when this will apply after the network is
> > > > > > > > > up and running and a new node suddenly appears:
> > > > > > > > > 
> > > > > > > > >  1. The new node starts with a "Request for address claimed" message, to
> > > > > > > > >  which your CF should respond with an "Address Claimed" message and NOT wait
> > > > > > > > >  250ms.
> > > > > > > > > 
> > > > > > > > > or
> > > > > > > > > 
> > > > > > > > >  2. The new node creates an addressing conflict either by claiming its address
> > > > > > > > >  without first sending a "request for address claimed" message or (and this is
> > > > > > > > >  your case) simply using its address without claiming it first.
> > > > > > > > > 
> > > > > > > > > It is this second possibility where there is a conflict that must be resolved,
> > > > > > > > > and then you must wait 250ms after claiming the conflicting address for
> > > > > > > > > yourself.
> > > > > > > > >   
> > > > > > > > > > completely ignore the "4.4.4.3 Address violation" that states that the
> > > > > > > > > > address-claimed message shall be sent also when "the CF receives a
> > > > > > > > > > message, other than the address-claimed message, which uses the CF's own
> > > > > > > > > > SA".
> > > > > > > > > > Please note that the address was already claimed by the CF, so I think
> > > > > > > > > > that the initialization requirements should not apply in this case since
> > > > > > > > > > all disputes were already resolved.  
> > > > > > > > > 
> > > > > > > > > Well, yes and no. The address was claimed before, yes, but then a new node came
> > > > > > > > > onto the bus and disputed that address. In that case the dispute needs to be
> > > > > > > > > resolved first. Imagine you would NOT wait 250ms, but the other CF did
> > > > > > > > > correctly claim its address, but it was you who did not receive that message
> > > > > > > > > for some reason. Now also assume that your own NAME has a lower priority than
> > > > > > > > > the other CF. In this case you can send a "claimed address" message to claim
> > > > > > > > > your address again, but it will be contested. If you don't wait for the
> > > > > > > > > contestant, it is you who will be in violation of the protocol, because you
> > > > > > > > > should have changed your own address but failed to do so.
> > > > > > > > >   
> > > > > > > > > > - If the offending CF ignores the dispute, as you said, then the other
> > > > > > > > > > CF has no chance to ever resume normal operation and so the network
> > > > > > > > > > cannot be aware that the other CF is not working correctly because the
> > > > > > > > > > offending CF is spoofing its own address.  
> > > > > > > > > 
> > > > > > > > > Correct. And like I said in my previous reply, this is unfortunately how CAN,
> > > > > > > > > J1939 and ISObus work. The whole network must cooperate and there is no
> > > > > > > > > consideration for malign or adversarial actors.
> > > > > > > > > There are also a lot of possible corner cases that these standards
> > > > > > > > > unfortunately do not take into account. Conformance test tools seem to be even
> > > > > > > > > more problematic and tend to have bugs quite often. I am still inclined to
> > > > > > > > > think this is the case with your test tool.
> > > > > > > > >   
> > > > > > > > > > This seems to make useless the
> > > > > > > > > > requirement that states to activate the DTC in "4.4.4.3 Address
> > > > > > > > > > violation".  
> > > > > > > > > 
> > > > > > > > > The requirement is not useless. You can still set and store the DTC, just not
> > > > > > > > > broadcast it to the network at that moment.
> > > > > > > > > 
> > > > > > > > > Best regards,
> > > > > > > > > 
> > > > > > > > >   
> > > > > > > > 
> > > > > > > > Thank you for your feedback and explanation.
> > > > > > > > I asked the customer to contact the compliance company so that we can
> > > > > > > > verify with them this particular use-case. I want to understand if there
> > > > > > > > is an application note or exception that states how to manage it or if
> > > > > > > > they implemented the test basing it on their own interpretation and how
> > > > > > > > it really works: supposing that the test does not check the DM1
> > > > > > > > presence, then the test could be passed even without sending the DM1
> > > > > > > > message during the 250ms after the adress-claimed message.
> > > > > > > > 
> > > > > > > > Best regards,
> > > > > > > > Devid  
> > > > > > > 
> > > > > > > Hi David, all,
> > > > > > > 
> > > > > > > I'm sorry for resuming this discussion after a long time but I noticed
> > > > > > > that the driver forces the 250 ms wait even when responding to a request
> > > > > > > for address-claimed which is against point d) of ISO 11783-5 "4.5.2
> > > > > > > Address claim requirements":
> > > > > > > 
> > > > > > > No CF shall begin, or resume, transmission on the network until 250 ms
> > > > > > > after it has successfully claimed  an  address  (see Figure 4), except
> > > > > > > when responding to a request for address-claimed.
> > > > > > > 
> > > > > > > IMHO the driver shall be able to detect above condition or shall not
> > > > > > > force the 250 ms wait which should then be implemented, depending on the
> > > > > > > case, on user-space application side.
> > > > > > 
> > > > > > I am a bit out of the loop with this driver, but I think what you say is
> > > > > > correct. The J1939 stack should NOT unconditionally stay silent for 250ms
> > > > > > after sending an Address Claimed message. It should specifically NOT do so if
> > > > > > it is just responding to a Request for Address Claimed message.
> > > > > > 
> > > > > > So if it is indeed so, that the J1939 stack will hold off sending messages
> > > > > > forcibly after sending an Address Claimed message as a reply to a Request for
> > > > > > Address Claimed, then I'd say this is a bug.
> > > > > > 
> > > > > > @Oleksij, can you confirm this?
> > > > > 
> > > > > I do not see any code path inside of the j1939 stack preventing sending
> > > > > you anything by address. The only part which cares about address
> > > > > claiming is net/can/j1939/address-claim.c and it will just not be able
> > > > > to resolve name to address, because address claiming was not finished
> > > > > jet. With other words, if you need to send responding to a request for
> > > > > address-claimed, then just send it by using address instead of name.
> > > > > 
> > > > > Regards,
> > > > > Oleksij
> > > > 
> > > > Hi Oleksij,
> > > > I'm sorry but I think I don't understand your proposal.
> > > > 
> > > > If I send an address-claimed message binding the socket without the name
> > > > (can_addr.j1939.name = J1939_NO_NAME), then the driver returns error
> > > > EPROTO.
> > > > If I send the address-claimed message binding the socket with the name,
> > > > then the address-claimed message is sent successfully but other messages
> > > > sent within 250 ms are not sent (error EADDRNOTAVAIL).
> > > 
> > > What kind of other messages are your trying to send?
> > > 
> > > Regards,
> > > Oleksij
> > 
> > Hi,
> > the application sends each second the DM1 (0xFECA), meanwhile it
> > receives an request for address-claimed message and it answers with the
> > address-claimed message.
> > If the DM1 is sent within 250 ms after the address-claimed message, then
> > it is rejected with error EADDRNOTAVAIL.
> > Since the driver is performing the claim each time the address-claimed
> > message is sent (even if it is a response to a request for address-
> > claimed), the EADDRNOTAVAIL error is expected in the 250 ms time window.
> > So, when a request for address-claimed message is received:
> > - You cannot send an address-claimed message with the socket bound with
> > J1939_NO_NAME because it is rejected with error EPROTO
> > - You can send an address-claimed message with the socket bound with the
> > name but you won't be able to send other messages within 250 ms because
> > they are rejected with error EADDRNOTAVAIL and this is against point d)
> > of ISO 11783-5 "4.5.2 Address claim requirements".
> 
> Ok, finally I understood it.
> 
> If I see it correctly, it is hard to fix second part of "ISO 11783-5
>  4.5.2 d)" without breaking first part of the same point.
> 
> Haw can I see the difference between AC and AC send as response for RfAC?
> Wait 250ms? What if some system starts just in this time and will send
> plain AC?
> 
> Regards,
> Oleksij

Hi Oleksij,

honestly I would apply proposed patch because it is the easier solution
and makes the driver compliant with the standard for the following
reasons:
- on the first claim, the kernel will wait 250 ms as stated by the
standard
+ on successive claims with the same name, the kernel will not wait
250ms, this implies:
  - it will not wait after sending the address-claimed message when the
claimed address has been spoofed, but the standard does not explicitly
states what to do in this case (see previous emails in this thread), so
it would be up to the application developer to decide how to manage the
conflict
  - it will not wait after sending the address-claimed message when a
request for address-claimed message has been received as stated by the
standard

Otherwise you will have to keep track of above cases and decide if the
wait is needed or not, but this is hard do accomplish because is the
application in charge of sending the address-claimed message, so you
would have to decide how much to keep track of the request for address-
claimed message thus adding more complexity to the code of the driver.

Another solution is to let the driver send the address-claimed message
waiting or without waiting 250 ms for successive messages depending on
the case.

Best Regards,
Devid


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RESEND] can: j1939: do not wait 250ms if the same addr was already claimed
  2022-11-18 15:12                               ` Devid Antonio Filoni
@ 2022-11-19 10:12                                 ` Oleksij Rempel
  2022-11-20  0:11                                   ` Devid Antonio Filoni
  0 siblings, 1 reply; 28+ messages in thread
From: Oleksij Rempel @ 2022-11-19 10:12 UTC (permalink / raw)
  To: Devid Antonio Filoni
  Cc: Oliver Hartkopp, Kurt Van Dijck, kbuild test robot, Maxime Jayat,
	Robin van der Gracht, linux-kernel, Oleksij Rempel, Paolo Abeni,
	Marc Kleine-Budde, kernel, David Jander, Jakub Kicinski, netdev,
	linux-can, David S. Miller

On Fri, Nov 18, 2022 at 04:12:40PM +0100, Devid Antonio Filoni wrote:
> Hi Oleksij,
> 
> honestly I would apply proposed patch because it is the easier solution
> and makes the driver compliant with the standard for the following
> reasons:
> - on the first claim, the kernel will wait 250 ms as stated by the
> standard
> + on successive claims with the same name, the kernel will not wait
> 250ms, this implies:
>   - it will not wait after sending the address-claimed message when the
> claimed address has been spoofed, but the standard does not explicitly
> states what to do in this case (see previous emails in this thread), so
> it would be up to the application developer to decide how to manage the
> conflict
>   - it will not wait after sending the address-claimed message when a
> request for address-claimed message has been received as stated by the
> standard

Standard says:
1. No CF _shall_ begin, or resume, transmission on the network until 250 ms
   after it has successfully claimed an address (Figure 4).
2. This does not apply when responding to a request for address claimed.

With current patch state: 1. is implemented and working as expected, 2.
is not implemented.
With this patch: 1. is partially broken and 2. is partially faking
needed behavior.

It will not wait if remote ECU which rebooted for some reasons. With this patch
we are breaking one case of the standard in favor to fake compatibility to the
other case. We should avoid waiting only based on presence of RfAC not based
on the old_addr == new_addr.

Without words 2. part should be implemented without breaking 1.

> Otherwise you will have to keep track of above cases and decide if the
> wait is needed or not, but this is hard do accomplish because is the
> application in charge of sending the address-claimed message, so you
> would have to decide how much to keep track of the request for address-
> claimed message thus adding more complexity to the code of the driver.

Current kernel already tracks all claims on the bus and knows all registered
NAMEs. I do not see increased complicity in this case.

IMHO, only missing part i a user space interface. Some thing like "ip n"
will do.

> Another solution is to let the driver send the address-claimed message
> waiting or without waiting 250 ms for successive messages depending on
> the case.

You can send "address-claimed message" in any time you wont. Kernel will
just not resolve the NAME to address until 1. part of the spec will
apply. Do not forget, the NAME cache is used for local _and_ remote
names. You can trick out local system, not remote.

Even if you implement "smart" logic in user space and will know better
then kernel, that this application is responding to RfAC. You will newer
know if address-claimed message of remote system is a response to RfAC.

From this perspective, I do not know, how allowing the user space break
the rules will help to solve the problem?

Regards,
Oleksij
-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RESEND] can: j1939: do not wait 250ms if the same addr was already claimed
  2022-11-19 10:12                                 ` Oleksij Rempel
@ 2022-11-20  0:11                                   ` Devid Antonio Filoni
  2022-11-20  8:45                                     ` Oleksij Rempel
  0 siblings, 1 reply; 28+ messages in thread
From: Devid Antonio Filoni @ 2022-11-20  0:11 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Oliver Hartkopp, Kurt Van Dijck, kbuild test robot, Maxime Jayat,
	Robin van der Gracht, linux-kernel, Oleksij Rempel, Paolo Abeni,
	Marc Kleine-Budde, kernel, David Jander, Jakub Kicinski, netdev,
	linux-can, David S. Miller

On Sat, 2022-11-19 at 11:12 +0100, Oleksij Rempel wrote:
> On Fri, Nov 18, 2022 at 04:12:40PM +0100, Devid Antonio Filoni wrote:
> > Hi Oleksij,
> > 
> > honestly I would apply proposed patch because it is the easier solution
> > and makes the driver compliant with the standard for the following
> > reasons:
> > - on the first claim, the kernel will wait 250 ms as stated by the
> > standard
> > + on successive claims with the same name, the kernel will not wait
> > 250ms, this implies:
> >   - it will not wait after sending the address-claimed message when the
> > claimed address has been spoofed, but the standard does not explicitly
> > states what to do in this case (see previous emails in this thread), so
> > it would be up to the application developer to decide how to manage the
> > conflict
> >   - it will not wait after sending the address-claimed message when a
> > request for address-claimed message has been received as stated by the
> > standard
> 
> Standard says:
> 1. No CF _shall_ begin, or resume, transmission on the network until 250 ms
>    after it has successfully claimed an address (Figure 4).
> 2. This does not apply when responding to a request for address claimed.
> 
> With current patch state: 1. is implemented and working as expected, 2.
> is not implemented.
> With this patch: 1. is partially broken and 2. is partially faking
> needed behavior.
> 
> It will not wait if remote ECU which rebooted for some reasons. With this patch
> we are breaking one case of the standard in favor to fake compatibility to the
> other case. We should avoid waiting only based on presence of RfAC not based
> on the old_addr == new_addr.

I'm sorry, I don't think I understood the point about reboot ("It will
not wait if remote ECU which rebooted for some reasons"). If another ECU
rebooted, then *it* will have to perform the claim procedure again
waiting 250 ms before beginning the transmission. Your ECU doesn't have
to check if the other ECUs respected the 250 ms wait.

Also, the ISO11783-5 standard, with "Figure 6 (Resolving address
contention between two self-configurable-address CF)" of "4.5.4.2 -
Address-claim prioritization", shows that:
- ECU1 claims the address (time: 0 ms)
- ECU2 claims the same address (time: 0+x ms)
- ECU1 NAME has the higher priority, so ECU1 sends again the address
claimed message as soon as it received the address-claim from ECU2
(time: 0+x+y ms)
- ECU1 starts normal transmission (time: 250 ms)
With current implementation, the ECU1 would start the transmission at
time 0+x+y+250 ms, with proposed patch it would not.
Same is showed in "Figure 7 (Resolving address contention between a non-
configurable address CF and a self-configurable address CF)", the ECU
waits again 250 ms only when claiming a different address.

Also, as previously discussed in this thread, the standard states in
4.4.4.3 - Address violation:
If a CF receives a message, other than the address-claimed message,
which uses the CF's own SA,
then the CF:
- shall send the address-claim message to the Global address;
- shall activate a diagnostic trouble code with SPN = 2000+SA and FMI =
31
It is not *explicitly* stated that you have to wait 250 ms after the
address-claim message has been sent. Please note that the 250 ms wait is
mentioned only in "4.5 - Network initialization" while above statements
come from "4.4 - Network-management procedures". Also in this case, the
proposed patch is still standard compliant.

So I'm sorry but I have to disagree with you, there are many things
broken in the current implementation because it is forcing the 250 wait
to all cases but it should not.

> 
> Without words 2. part should be implemented without breaking 1.
> 
> > Otherwise you will have to keep track of above cases and decide if the
> > wait is needed or not, but this is hard do accomplish because is the
> > application in charge of sending the address-claimed message, so you
> > would have to decide how much to keep track of the request for address-
> > claimed message thus adding more complexity to the code of the driver.
> 
> Current kernel already tracks all claims on the bus and knows all registered
> NAMEs. I do not see increased complicity in this case.

The kernel tracks the claims but it does *not track* incoming requests
for address-claimed message, it would have to and it would have to allow
the application to answer to it *within a defined time window*. But keep
in mind that there are other cases when the 250 ms wait is wrong or it
is not explicitly stated by the standard.

> 
> IMHO, only missing part i a user space interface. Some thing like "ip n"
> will do.
> 
> > Another solution is to let the driver send the address-claimed message
> > waiting or without waiting 250 ms for successive messages depending on
> > the case.
> 
> You can send "address-claimed message" in any time you wont. Kernel will
> just not resolve the NAME to address until 1. part of the spec will
> apply. Do not forget, the NAME cache is used for local _and_ remote
> names. You can trick out local system, not remote.
> 
> Even if you implement "smart" logic in user space and will know better
> then kernel, that this application is responding to RfAC. You will newer
> know if address-claimed message of remote system is a response to RfAC.
> 
> From this perspective, I do not know, how allowing the user space break
> the rules will help to solve the problem?

I think you did not understand this last proposal: since the driver is
already implementing part of the standard, then it might as well send
the address-claimed message when needed and wait 250 ms or not depending
on the case.
In this way, for example, you won't have to keep track of a request for
address-claimed, you just would have to answer to it directly.

Feel free to implement what you think is more appropriate but please
read the ISO11783-5 standard carefully too before changing the code,
there are many cases and it is not possible to simplify everything into
one rule.

Meanwhile I'm going to apply the patch to my own kernel, I've tried to
workaround the limitation using a CAN_RAW socket to send the address-
claimed message but the J1939 driver refuses to send other messages in
the 250 ms time window because it has detected the address-claimed
message sent from the other socket, so I can only apply the patch to
make it compliant with the standard.

> 
> Regards,
> Oleksij

Best Regards,
Devid


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RESEND] can: j1939: do not wait 250ms if the same addr was already claimed
  2022-11-20  0:11                                   ` Devid Antonio Filoni
@ 2022-11-20  8:45                                     ` Oleksij Rempel
  2022-11-20 19:18                                       ` Devid Antonio Filoni
  0 siblings, 1 reply; 28+ messages in thread
From: Oleksij Rempel @ 2022-11-20  8:45 UTC (permalink / raw)
  To: Devid Antonio Filoni
  Cc: Kurt Van Dijck, kbuild test robot, Maxime Jayat, Oliver Hartkopp,
	David Jander, linux-kernel, Oleksij Rempel, netdev,
	Marc Kleine-Budde, kernel, Robin van der Gracht, Jakub Kicinski,
	Paolo Abeni, David S. Miller, linux-can

On Sun, Nov 20, 2022 at 01:11:52AM +0100, Devid Antonio Filoni wrote:
> On Sat, 2022-11-19 at 11:12 +0100, Oleksij Rempel wrote:
> > On Fri, Nov 18, 2022 at 04:12:40PM +0100, Devid Antonio Filoni wrote:
> > > Hi Oleksij,
> > > 
> > > honestly I would apply proposed patch because it is the easier solution
> > > and makes the driver compliant with the standard for the following
> > > reasons:
> > > - on the first claim, the kernel will wait 250 ms as stated by the
> > > standard
> > > + on successive claims with the same name, the kernel will not wait
> > > 250ms, this implies:
> > >   - it will not wait after sending the address-claimed message when the
> > > claimed address has been spoofed, but the standard does not explicitly
> > > states what to do in this case (see previous emails in this thread), so
> > > it would be up to the application developer to decide how to manage the
> > > conflict
> > >   - it will not wait after sending the address-claimed message when a
> > > request for address-claimed message has been received as stated by the
> > > standard
> > 
> > Standard says:
> > 1. No CF _shall_ begin, or resume, transmission on the network until 250 ms
> >    after it has successfully claimed an address (Figure 4).
> > 2. This does not apply when responding to a request for address claimed.
> > 
> > With current patch state: 1. is implemented and working as expected, 2.
> > is not implemented.
> > With this patch: 1. is partially broken and 2. is partially faking
> > needed behavior.
> > 
> > It will not wait if remote ECU which rebooted for some reasons. With this patch
> > we are breaking one case of the standard in favor to fake compatibility to the
> > other case. We should avoid waiting only based on presence of RfAC not based
> > on the old_addr == new_addr.
> 
> I'm sorry, I don't think I understood the point about reboot ("It will
> not wait if remote ECU which rebooted for some reasons"). If another ECU
> rebooted, then *it* will have to perform the claim procedure again
> waiting 250 ms before beginning the transmission. Your ECU doesn't have
> to check if the other ECUs respected the 250 ms wait.

With proposed patch:
- local application which is sending to the remote NAME, will start or continue
  communication with ECU which should stay silent.
- local application which was manually or automatically restarted (see
  application watchdogs), will bypass address claim procedure
  completion and start sending without 250ms delay.

> Also, the ISO11783-5 standard, with "Figure 6 (Resolving address
> contention between two self-configurable-address CF)" of "4.5.4.2 -
> Address-claim prioritization", shows that:
> - ECU1 claims the address (time: 0 ms)
> - ECU2 claims the same address (time: 0+x ms)
> - ECU1 NAME has the higher priority, so ECU1 sends again the address
> claimed message as soon as it received the address-claim from ECU2
> (time: 0+x+y ms)
> - ECU1 starts normal transmission (time: 250 ms)
> With current implementation, the ECU1 would start the transmission at
> time 0+x+y+250 ms, with proposed patch it would not.

You are right, this should be fixed.
But proposed patch closes one issues and opens another, with this patch it will
be enough to send at least two address claimed messages to bypass the delay.

> Same is showed in "Figure 7 (Resolving address contention between a non-
> configurable address CF and a self-configurable address CF)", the ECU
> waits again 250 ms only when claiming a different address.

Ack

> Also, as previously discussed in this thread, the standard states in
> 4.4.4.3 - Address violation:
> If a CF receives a message, other than the address-claimed message,
> which uses the CF's own SA,
> then the CF:
> - shall send the address-claim message to the Global address;
> - shall activate a diagnostic trouble code with SPN = 2000+SA and FMI =
> 31
> It is not *explicitly* stated that you have to wait 250 ms after the
> address-claim message has been sent.

There is no need to explicitly state it. The requirement is clearly described
in the 4.5.2.d part 1 with clearly defined exception in  4.5.2.d part 2.
If something is not explicitly stated, the stated requirement has always
priority.

> Please note that the 250 ms wait is  mentioned only in "4.5 - Network
> initialization"

OK, we need to refer to the wording used in a specifications, in
general:
Shall – Shall is used to designate a mandatory requirement.
Should – Should is used for requirements that are considered good and are
         recommended, but are not absolutely mandatory.
May – May is used to for requirements that are optional.

If a requirement with strong wording as "shall" is not strong enough for
you and you are suing words as ".. mentioned only in .." then even a
statistical analysis of this spec will have no meaning. In all
cases we can just invalidate all arguments by using: it is only X or Y. 

> while above statements come from "4.4 - Network-management procedures".
> Also in this case, the proposed patch is still standard compliant.

If we remove 4.5.2.d from the spec, then yes.

> So I'm sorry but I have to disagree with you, there are many things
> broken in the current implementation because it is forcing the 250 wait
> to all cases but it should not.

If we remove 4.5.2.d from the spec, then yes. Every construction is
logical if we adopt input variables to the construction.

> > Without words 2. part should be implemented without breaking 1.
> > 
> > > Otherwise you will have to keep track of above cases and decide if the
> > > wait is needed or not, but this is hard do accomplish because is the
> > > application in charge of sending the address-claimed message, so you
> > > would have to decide how much to keep track of the request for address-
> > > claimed message thus adding more complexity to the code of the driver.
> > 
> > Current kernel already tracks all claims on the bus and knows all registered
> > NAMEs. I do not see increased complicity in this case.
> 
> The kernel tracks the claims but it does *not track* incoming requests
> for address-claimed message, it would have to and it would have to

yes

> allow the application to answer to it *within a defined time window*.

yes.

> But keep in mind that there are other cases when the 250 ms wait is wrong
> or it is not explicitly stated by the standard.

If it is not stated in the standard how can we decide if it is wrong?
And if strongly worded statements have no value just because it is
stated only one time, how proper standard should look like? 

> > IMHO, only missing part i a user space interface. Some thing like "ip n"
> > will do.
> > 
> > > Another solution is to let the driver send the address-claimed message
> > > waiting or without waiting 250 ms for successive messages depending on
> > > the case.
> > 
> > You can send "address-claimed message" in any time you wont. Kernel will
> > just not resolve the NAME to address until 1. part of the spec will
> > apply. Do not forget, the NAME cache is used for local _and_ remote
> > names. You can trick out local system, not remote.
> > 
> > Even if you implement "smart" logic in user space and will know better
> > then kernel, that this application is responding to RfAC. You will newer
> > know if address-claimed message of remote system is a response to RfAC.
> > 
> > From this perspective, I do not know, how allowing the user space break
> > the rules will help to solve the problem?
> 
> I think you did not understand this last proposal: since the driver is
> already implementing part of the standard, then it might as well send
> the address-claimed message when needed and wait 250 ms or not depending
> on the case.

Let's try following test:
j1939acd -r 80 -c /tmp/1122334455667788.jacd 11223344556677 vcan0 &
while(true); do testj1939 -s8 vcan0:0x80 :0x90,0x12300; done

And start candump with delta time stamps:
:~ candump -t d vcan0                                                 
 (000.000000)  vcan0  18EAFFFE   [3]  00 EE 00               
 (000.002437)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF <---- no 250ms delay
 (000.011458)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
 (000.011964)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
 (000.011712)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
 (000.012585)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
 (000.012891)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
 (000.012082)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
 (000.012604)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
 (000.012357)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
 (000.012790)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
 (000.012765)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
 (000.012483)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
 (000.012680)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
 (000.012144)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
... snip ...
 (000.012592)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
 (000.012515)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
 (000.013183)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
 (000.012653)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
 (000.011886)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
 (000.012836)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
 (000.009494)  vcan0  18EEFF80   [8]  77 66 55 44 33 22 11 00 <---- SA 0x80 address claimed 
 (000.003362)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF <---- next packet from SA 0x80 3 usecs after previous. No 250ms delay.
 (000.012351)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
 (000.012983)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
 (000.012602)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
 (000.012594)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
 (000.012348)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
 (000.011922)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF

As you can see, the j1939 stack do not forcing application to use NAMEs and
do not preventing sending any message withing 250ms delay. The only thing
what has the 250 timer is NAME to address resolution which should be fixed in
respect of 4.5.2.d without breaking every thing else.

> In this way, for example, you won't have to keep track of a request for
> address-claimed, you just would have to answer to it directly.

see example above.

> Feel free to implement what you think is more appropriate but please
> read the ISO11783-5 standard carefully too before changing the code,
> there are many cases and it is not possible to simplify everything into
> one rule.

this is why we have this discussion.

> Meanwhile I'm going to apply the patch to my own kernel, I've tried to
> workaround the limitation using a CAN_RAW socket to send the address-
> claimed message but the J1939 driver refuses to send other messages in
> the 250 ms time window because it has detected the address-claimed
> message sent from the other socket, so I can only apply the patch to
> make it compliant with the standard.

If you can use CAN_RAW you can use above example without any delay.

Regards,
Oleksij
-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RESEND] can: j1939: do not wait 250ms if the same addr was already claimed
  2022-11-20  8:45                                     ` Oleksij Rempel
@ 2022-11-20 19:18                                       ` Devid Antonio Filoni
  2022-11-21  5:19                                         ` Oleksij Rempel
  0 siblings, 1 reply; 28+ messages in thread
From: Devid Antonio Filoni @ 2022-11-20 19:18 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Kurt Van Dijck, kbuild test robot, Maxime Jayat, Oliver Hartkopp,
	David Jander, linux-kernel, Oleksij Rempel, netdev,
	Marc Kleine-Budde, kernel, Robin van der Gracht, Jakub Kicinski,
	Paolo Abeni, David S. Miller, linux-can

On Sun, 2022-11-20 at 09:45 +0100, Oleksij Rempel wrote:
> On Sun, Nov 20, 2022 at 01:11:52AM +0100, Devid Antonio Filoni wrote:
> > On Sat, 2022-11-19 at 11:12 +0100, Oleksij Rempel wrote:
> > > On Fri, Nov 18, 2022 at 04:12:40PM +0100, Devid Antonio Filoni wrote:
> > > > Hi Oleksij,
> > > > 
> > > > honestly I would apply proposed patch because it is the easier solution
> > > > and makes the driver compliant with the standard for the following
> > > > reasons:
> > > > - on the first claim, the kernel will wait 250 ms as stated by the
> > > > standard
> > > > + on successive claims with the same name, the kernel will not wait
> > > > 250ms, this implies:
> > > >   - it will not wait after sending the address-claimed message when the
> > > > claimed address has been spoofed, but the standard does not explicitly
> > > > states what to do in this case (see previous emails in this thread), so
> > > > it would be up to the application developer to decide how to manage the
> > > > conflict
> > > >   - it will not wait after sending the address-claimed message when a
> > > > request for address-claimed message has been received as stated by the
> > > > standard
> > > 
> > > Standard says:
> > > 1. No CF _shall_ begin, or resume, transmission on the network until 250 ms
> > >    after it has successfully claimed an address (Figure 4).
> > > 2. This does not apply when responding to a request for address claimed.
> > > 
> > > With current patch state: 1. is implemented and working as expected, 2.
> > > is not implemented.
> > > With this patch: 1. is partially broken and 2. is partially faking
> > > needed behavior.
> > > 
> > > It will not wait if remote ECU which rebooted for some reasons. With this patch
> > > we are breaking one case of the standard in favor to fake compatibility to the
> > > other case. We should avoid waiting only based on presence of RfAC not based
> > > on the old_addr == new_addr.
> > 
> > I'm sorry, I don't think I understood the point about reboot ("It will
> > not wait if remote ECU which rebooted for some reasons"). If another ECU
> > rebooted, then *it* will have to perform the claim procedure again
> > waiting 250 ms before beginning the transmission. Your ECU doesn't have
> > to check if the other ECUs respected the 250 ms wait.
> 
> With proposed patch:
> - local application which is sending to the remote NAME, will start or continue
>   communication with ECU which should stay silent.

And this is not forbidden by the standard, the standard states that the
remote ECU shall not start or continue the communication but it can
*receive* messages.
For example, what would you do if:
- during the 250 ms wait, another ECU sends a request-for-address-
claimed message meant to the address you're claiming?
From "4.5.3 Other requirements for initialization":
A CF shall respond to a request-for-address-claimed message when the
destination address is the same as the CF's address and shall transmit
its response to the Global address (255).
- during the 250 ms wait another ECU sends a normal message (non
address-claim related) using the SA you're currently claiming?

> - local application which was manually or automatically restarted (see
>   application watchdogs), will bypass address claim procedure
>   completion and start sending without 250ms delay.

Then the application will be violating the standard, you're right,
however please note that, as per driver implementation, each time the
socket is closed and opened again (if bound with a name) you have to
send the address-claimed message again.
The standard also states how to treat this kind of violations on the
remote ECU side.

> 
> > Also, the ISO11783-5 standard, with "Figure 6 (Resolving address
> > contention between two self-configurable-address CF)" of "4.5.4.2 -
> > Address-claim prioritization", shows that:
> > - ECU1 claims the address (time: 0 ms)
> > - ECU2 claims the same address (time: 0+x ms)
> > - ECU1 NAME has the higher priority, so ECU1 sends again the address
> > claimed message as soon as it received the address-claim from ECU2
> > (time: 0+x+y ms)
> > - ECU1 starts normal transmission (time: 250 ms)
> > With current implementation, the ECU1 would start the transmission at
> > time 0+x+y+250 ms, with proposed patch it would not.
> 
> You are right, this should be fixed.
> But proposed patch closes one issues and opens another, with this patch it will
> be enough to send at least two address claimed messages to bypass the delay.

No, because the timer associated with the first claim *is not stopped*.

> 
> > Same is showed in "Figure 7 (Resolving address contention between a non-
> > configurable address CF and a self-configurable address CF)", the ECU
> > waits again 250 ms only when claiming a different address.
> 
> Ack
> 
> > Also, as previously discussed in this thread, the standard states in
> > 4.4.4.3 - Address violation:
> > If a CF receives a message, other than the address-claimed message,
> > which uses the CF's own SA,
> > then the CF:
> > - shall send the address-claim message to the Global address;
> > - shall activate a diagnostic trouble code with SPN = 2000+SA and FMI =
> > 31
> > It is not *explicitly* stated that you have to wait 250 ms after the
> > address-claim message has been sent.
> 
> There is no need to explicitly state it. The requirement is clearly described
> in the 4.5.2.d part 1 with clearly defined exception in  4.5.2.d part 2.
> If something is not explicitly stated, the stated requirement has always
> priority.
> 
> > Please note that the 250 ms wait is  mentioned only in "4.5 - Network
> > initialization"
> 
> OK, we need to refer to the wording used in a specifications, in
> general:
> Shall – Shall is used to designate a mandatory requirement.
> Should – Should is used for requirements that are considered good and are
>          recommended, but are not absolutely mandatory.
> May – May is used to for requirements that are optional.
> 
> If a requirement with strong wording as "shall" is not strong enough for
> you and you are suing words as ".. mentioned only in .." then even a
> statistical analysis of this spec will have no meaning. In all
> cases we can just invalidate all arguments by using: it is only X or Y. 
> 
> > while above statements come from "4.4 - Network-management procedures".
> > Also in this case, the proposed patch is still standard compliant.
> 
> If we remove 4.5.2.d from the spec, then yes.
> 
> > So I'm sorry but I have to disagree with you, there are many things
> > broken in the current implementation because it is forcing the 250 wait
> > to all cases but it should not.
> 
> If we remove 4.5.2.d from the spec, then yes. Every construction is
> logical if we adopt input variables to the construction.

From "4.4.4.3 - Address violation":
- *shall send the address-claim message* to the Global address
From "4.5.2 Address claim requirements":
- No CF shall begin, or resume, transmission on the network until 250 ms
after it has successfully *claimed an address*, except when responding
to a request for address-claimed.

Do you see any difference?
With your interpretation of the standard, then above 4.5.2.d sentence
shall be:
- No CF shall begin, or resume, transmission on the network until 250 ms
after it has successfully *sent the address-claim message*, except when
responding to a request for address-claimed.

I think "it has successfully claimed an address" is valid for the whole
claim procedure and not for the address-claimed message only.

Please note that the ECU shall send the address-claim message also when
it receives a request for a matching NAME ("4.4.3.2 NAME management (NM)
message"). This does not mean that is claiming again the address.

> 
> > > Without words 2. part should be implemented without breaking 1.
> > > 
> > > > Otherwise you will have to keep track of above cases and decide if the
> > > > wait is needed or not, but this is hard do accomplish because is the
> > > > application in charge of sending the address-claimed message, so you
> > > > would have to decide how much to keep track of the request for address-
> > > > claimed message thus adding more complexity to the code of the driver.
> > > 
> > > Current kernel already tracks all claims on the bus and knows all registered
> > > NAMEs. I do not see increased complicity in this case.
> > 
> > The kernel tracks the claims but it does *not track* incoming requests
> > for address-claimed message, it would have to and it would have to
> 
> yes
> 
> > allow the application to answer to it *within a defined time window*.
> 
> yes.
> 
> > But keep in mind that there are other cases when the 250 ms wait is wrong
> > or it is not explicitly stated by the standard.
> 
> If it is not stated in the standard how can we decide if it is wrong?
And how can we decide if it is right? :)

> And if strongly worded statements have no value just because it is
> stated only one time, how proper standard should look like? 
See above.

> 
> > > IMHO, only missing part i a user space interface. Some thing like "ip n"
> > > will do.
> > > 
> > > > Another solution is to let the driver send the address-claimed message
> > > > waiting or without waiting 250 ms for successive messages depending on
> > > > the case.
> > > 
> > > You can send "address-claimed message" in any time you wont. Kernel will
> > > just not resolve the NAME to address until 1. part of the spec will
> > > apply. Do not forget, the NAME cache is used for local _and_ remote
> > > names. You can trick out local system, not remote.
> > > 
> > > Even if you implement "smart" logic in user space and will know better
> > > then kernel, that this application is responding to RfAC. You will newer
> > > know if address-claimed message of remote system is a response to RfAC.
> > > 
> > > From this perspective, I do not know, how allowing the user space break
> > > the rules will help to solve the problem?
> > 
> > I think you did not understand this last proposal: since the driver is
> > already implementing part of the standard, then it might as well send
> > the address-claimed message when needed and wait 250 ms or not depending
> > on the case.
> 
> Let's try following test:
> j1939acd -r 80 -c /tmp/1122334455667788.jacd 11223344556677 vcan0 &
> while(true); do testj1939 -s8 vcan0:0x80 :0x90,0x12300; done
> 
> And start candump with delta time stamps:
> :~ candump -t d vcan0                                                 
>  (000.000000)  vcan0  18EAFFFE   [3]  00 EE 00               
>  (000.002437)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF <---- no 250ms delay
>  (000.011458)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
>  (000.011964)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
>  (000.011712)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
>  (000.012585)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
>  (000.012891)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
>  (000.012082)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
>  (000.012604)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
>  (000.012357)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
>  (000.012790)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
>  (000.012765)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
>  (000.012483)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
>  (000.012680)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
>  (000.012144)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> ... snip ...
>  (000.012592)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
>  (000.012515)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
>  (000.013183)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
>  (000.012653)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
>  (000.011886)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
>  (000.012836)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
>  (000.009494)  vcan0  18EEFF80   [8]  77 66 55 44 33 22 11 00 <---- SA 0x80 address claimed 
>  (000.003362)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF <---- next packet from SA 0x80 3 usecs after previous. No 250ms delay.
>  (000.012351)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
>  (000.012983)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
>  (000.012602)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
>  (000.012594)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
>  (000.012348)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
>  (000.011922)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> 
> As you can see, the j1939 stack do not forcing application to use NAMEs and
> do not preventing sending any message withing 250ms delay. The only thing
> what has the 250 timer is NAME to address resolution which should be fixed in
> respect of 4.5.2.d without breaking every thing else.

Yes this is clear, this is working because the socket used by testj1939
is not bound to any name.

Just to clarify: are you suggesting to applications developer to use one
socket (bound with the name) to manage the address-claim and another one
(bound without the name) for other transmissions? If so, then why that
code exists in the driver?
Honestly I would consider this proposal really bad since this would
allow to completely violate the standard. I really hope you agree with
me about this.

> 
> > In this way, for example, you won't have to keep track of a request for
> > address-claimed, you just would have to answer to it directly.
> 
> see example above.
> 
> > Feel free to implement what you think is more appropriate but please
> > read the ISO11783-5 standard carefully too before changing the code,
> > there are many cases and it is not possible to simplify everything into
> > one rule.
> 
> this is why we have this discussion.
> 
> > Meanwhile I'm going to apply the patch to my own kernel, I've tried to
> > workaround the limitation using a CAN_RAW socket to send the address-
> > claimed message but the J1939 driver refuses to send other messages in
> > the 250 ms time window because it has detected the address-claimed
> > message sent from the other socket, so I can only apply the patch to
> > make it compliant with the standard.
> 
> If you can use CAN_RAW you can use above example without any delay.
> 
> Regards,
> Oleksij

Best Regards,
Devid


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RESEND] can: j1939: do not wait 250ms if the same addr was already claimed
  2022-11-20 19:18                                       ` Devid Antonio Filoni
@ 2022-11-21  5:19                                         ` Oleksij Rempel
  2022-11-23 20:39                                           ` Devid Antonio Filoni
  0 siblings, 1 reply; 28+ messages in thread
From: Oleksij Rempel @ 2022-11-21  5:19 UTC (permalink / raw)
  To: Devid Antonio Filoni
  Cc: Kurt Van Dijck, kbuild test robot, Maxime Jayat, Oliver Hartkopp,
	David Jander, linux-kernel, Oleksij Rempel, netdev,
	Marc Kleine-Budde, kernel, Robin van der Gracht, Jakub Kicinski,
	Paolo Abeni, David S. Miller, linux-can

On Sun, Nov 20, 2022 at 08:18:32PM +0100, Devid Antonio Filoni wrote:
> On Sun, 2022-11-20 at 09:45 +0100, Oleksij Rempel wrote:
> > On Sun, Nov 20, 2022 at 01:11:52AM +0100, Devid Antonio Filoni wrote:
> > > On Sat, 2022-11-19 at 11:12 +0100, Oleksij Rempel wrote:
> > > > On Fri, Nov 18, 2022 at 04:12:40PM +0100, Devid Antonio Filoni wrote:
> > > > > Hi Oleksij,
> > > > > 
> > > > > honestly I would apply proposed patch because it is the easier solution
> > > > > and makes the driver compliant with the standard for the following
> > > > > reasons:
> > > > > - on the first claim, the kernel will wait 250 ms as stated by the
> > > > > standard
> > > > > + on successive claims with the same name, the kernel will not wait
> > > > > 250ms, this implies:
> > > > >   - it will not wait after sending the address-claimed message when the
> > > > > claimed address has been spoofed, but the standard does not explicitly
> > > > > states what to do in this case (see previous emails in this thread), so
> > > > > it would be up to the application developer to decide how to manage the
> > > > > conflict
> > > > >   - it will not wait after sending the address-claimed message when a
> > > > > request for address-claimed message has been received as stated by the
> > > > > standard
> > > > 
> > > > Standard says:
> > > > 1. No CF _shall_ begin, or resume, transmission on the network until 250 ms
> > > >    after it has successfully claimed an address (Figure 4).
> > > > 2. This does not apply when responding to a request for address claimed.
> > > > 
> > > > With current patch state: 1. is implemented and working as expected, 2.
> > > > is not implemented.
> > > > With this patch: 1. is partially broken and 2. is partially faking
> > > > needed behavior.
> > > > 
> > > > It will not wait if remote ECU which rebooted for some reasons. With this patch
> > > > we are breaking one case of the standard in favor to fake compatibility to the
> > > > other case. We should avoid waiting only based on presence of RfAC not based
> > > > on the old_addr == new_addr.
> > > 
> > > I'm sorry, I don't think I understood the point about reboot ("It will
> > > not wait if remote ECU which rebooted for some reasons"). If another ECU
> > > rebooted, then *it* will have to perform the claim procedure again
> > > waiting 250 ms before beginning the transmission. Your ECU doesn't have
> > > to check if the other ECUs respected the 250 ms wait.
> > 
> > With proposed patch:
> > - local application which is sending to the remote NAME, will start or continue
> >   communication with ECU which should stay silent.
> 
> And this is not forbidden by the standard, the standard states that the
> remote ECU shall not start or continue the communication but it can
> *receive* messages.
> For example, what would you do if:
> - during the 250 ms wait, another ECU sends a request-for-address-
> claimed message meant to the address you're claiming?
> From "4.5.3 Other requirements for initialization":
> A CF shall respond to a request-for-address-claimed message when the
> destination address is the same as the CF's address and shall transmit
> its response to the Global address (255).
> - during the 250 ms wait another ECU sends a normal message (non
> address-claim related) using the SA you're currently claiming?
> 
> > - local application which was manually or automatically restarted (see
> >   application watchdogs), will bypass address claim procedure
> >   completion and start sending without 250ms delay.
> 
> Then the application will be violating the standard, you're right,
> however please note that, as per driver implementation, each time the
> socket is closed and opened again (if bound with a name) you have to
> send the address-claimed message again.
> The standard also states how to treat this kind of violations on the
> remote ECU side.
> 
> > 
> > > Also, the ISO11783-5 standard, with "Figure 6 (Resolving address
> > > contention between two self-configurable-address CF)" of "4.5.4.2 -
> > > Address-claim prioritization", shows that:
> > > - ECU1 claims the address (time: 0 ms)
> > > - ECU2 claims the same address (time: 0+x ms)
> > > - ECU1 NAME has the higher priority, so ECU1 sends again the address
> > > claimed message as soon as it received the address-claim from ECU2
> > > (time: 0+x+y ms)
> > > - ECU1 starts normal transmission (time: 250 ms)
> > > With current implementation, the ECU1 would start the transmission at
> > > time 0+x+y+250 ms, with proposed patch it would not.
> > 
> > You are right, this should be fixed.
> > But proposed patch closes one issues and opens another, with this patch it will
> > be enough to send at least two address claimed messages to bypass the delay.
> 
> No, because the timer associated with the first claim *is not stopped*.
> 
> > 
> > > Same is showed in "Figure 7 (Resolving address contention between a non-
> > > configurable address CF and a self-configurable address CF)", the ECU
> > > waits again 250 ms only when claiming a different address.
> > 
> > Ack
> > 
> > > Also, as previously discussed in this thread, the standard states in
> > > 4.4.4.3 - Address violation:
> > > If a CF receives a message, other than the address-claimed message,
> > > which uses the CF's own SA,
> > > then the CF:
> > > - shall send the address-claim message to the Global address;
> > > - shall activate a diagnostic trouble code with SPN = 2000+SA and FMI =
> > > 31
> > > It is not *explicitly* stated that you have to wait 250 ms after the
> > > address-claim message has been sent.
> > 
> > There is no need to explicitly state it. The requirement is clearly described
> > in the 4.5.2.d part 1 with clearly defined exception in  4.5.2.d part 2.
> > If something is not explicitly stated, the stated requirement has always
> > priority.
> > 
> > > Please note that the 250 ms wait is  mentioned only in "4.5 - Network
> > > initialization"
> > 
> > OK, we need to refer to the wording used in a specifications, in
> > general:
> > Shall – Shall is used to designate a mandatory requirement.
> > Should – Should is used for requirements that are considered good and are
> >          recommended, but are not absolutely mandatory.
> > May – May is used to for requirements that are optional.
> > 
> > If a requirement with strong wording as "shall" is not strong enough for
> > you and you are suing words as ".. mentioned only in .." then even a
> > statistical analysis of this spec will have no meaning. In all
> > cases we can just invalidate all arguments by using: it is only X or Y. 
> > 
> > > while above statements come from "4.4 - Network-management procedures".
> > > Also in this case, the proposed patch is still standard compliant.
> > 
> > If we remove 4.5.2.d from the spec, then yes.
> > 
> > > So I'm sorry but I have to disagree with you, there are many things
> > > broken in the current implementation because it is forcing the 250 wait
> > > to all cases but it should not.
> > 
> > If we remove 4.5.2.d from the spec, then yes. Every construction is
> > logical if we adopt input variables to the construction.
> 
> From "4.4.4.3 - Address violation":
> - *shall send the address-claim message* to the Global address
> From "4.5.2 Address claim requirements":
> - No CF shall begin, or resume, transmission on the network until 250 ms
> after it has successfully *claimed an address*, except when responding
> to a request for address-claimed.
> 
> Do you see any difference?
> With your interpretation of the standard, then above 4.5.2.d sentence
> shall be:
> - No CF shall begin, or resume, transmission on the network until 250 ms
> after it has successfully *sent the address-claim message*, except when
> responding to a request for address-claimed.
> 
> I think "it has successfully claimed an address" is valid for the whole
> claim procedure and not for the address-claimed message only.
> 
> Please note that the ECU shall send the address-claim message also when
> it receives a request for a matching NAME ("4.4.3.2 NAME management (NM)
> message"). This does not mean that is claiming again the address.
> 
> > 
> > > > Without words 2. part should be implemented without breaking 1.
> > > > 
> > > > > Otherwise you will have to keep track of above cases and decide if the
> > > > > wait is needed or not, but this is hard do accomplish because is the
> > > > > application in charge of sending the address-claimed message, so you
> > > > > would have to decide how much to keep track of the request for address-
> > > > > claimed message thus adding more complexity to the code of the driver.
> > > > 
> > > > Current kernel already tracks all claims on the bus and knows all registered
> > > > NAMEs. I do not see increased complicity in this case.
> > > 
> > > The kernel tracks the claims but it does *not track* incoming requests
> > > for address-claimed message, it would have to and it would have to
> > 
> > yes
> > 
> > > allow the application to answer to it *within a defined time window*.
> > 
> > yes.
> > 
> > > But keep in mind that there are other cases when the 250 ms wait is wrong
> > > or it is not explicitly stated by the standard.
> > 
> > If it is not stated in the standard how can we decide if it is wrong?
> And how can we decide if it is right? :)
> 
> > And if strongly worded statements have no value just because it is
> > stated only one time, how proper standard should look like? 
> See above.
> 
> > 
> > > > IMHO, only missing part i a user space interface. Some thing like "ip n"
> > > > will do.
> > > > 
> > > > > Another solution is to let the driver send the address-claimed message
> > > > > waiting or without waiting 250 ms for successive messages depending on
> > > > > the case.
> > > > 
> > > > You can send "address-claimed message" in any time you wont. Kernel will
> > > > just not resolve the NAME to address until 1. part of the spec will
> > > > apply. Do not forget, the NAME cache is used for local _and_ remote
> > > > names. You can trick out local system, not remote.
> > > > 
> > > > Even if you implement "smart" logic in user space and will know better
> > > > then kernel, that this application is responding to RfAC. You will newer
> > > > know if address-claimed message of remote system is a response to RfAC.
> > > > 
> > > > From this perspective, I do not know, how allowing the user space break
> > > > the rules will help to solve the problem?
> > > 
> > > I think you did not understand this last proposal: since the driver is
> > > already implementing part of the standard, then it might as well send
> > > the address-claimed message when needed and wait 250 ms or not depending
> > > on the case.
> > 
> > Let's try following test:
> > j1939acd -r 80 -c /tmp/1122334455667788.jacd 11223344556677 vcan0 &
> > while(true); do testj1939 -s8 vcan0:0x80 :0x90,0x12300; done
> > 
> > And start candump with delta time stamps:
> > :~ candump -t d vcan0                                                 
> >  (000.000000)  vcan0  18EAFFFE   [3]  00 EE 00               
> >  (000.002437)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF <---- no 250ms delay
> >  (000.011458)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> >  (000.011964)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> >  (000.011712)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> >  (000.012585)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> >  (000.012891)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> >  (000.012082)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> >  (000.012604)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> >  (000.012357)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> >  (000.012790)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> >  (000.012765)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> >  (000.012483)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> >  (000.012680)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> >  (000.012144)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > ... snip ...
> >  (000.012592)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> >  (000.012515)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> >  (000.013183)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> >  (000.012653)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> >  (000.011886)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> >  (000.012836)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> >  (000.009494)  vcan0  18EEFF80   [8]  77 66 55 44 33 22 11 00 <---- SA 0x80 address claimed 
> >  (000.003362)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF <---- next packet from SA 0x80 3 usecs after previous. No 250ms delay.
> >  (000.012351)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> >  (000.012983)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> >  (000.012602)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> >  (000.012594)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> >  (000.012348)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> >  (000.011922)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > 
> > As you can see, the j1939 stack do not forcing application to use NAMEs and
> > do not preventing sending any message withing 250ms delay. The only thing
> > what has the 250 timer is NAME to address resolution which should be fixed in
> > respect of 4.5.2.d without breaking every thing else.
> 
> Yes this is clear, this is working because the socket used by testj1939
> is not bound to any name.
> 
> Just to clarify: are you suggesting to applications developer to use one
> socket (bound with the name) to manage the address-claim and another one
> (bound without the name) for other transmissions? If so, then why that
> code exists in the driver?
> Honestly I would consider this proposal really bad since this would
> allow to completely violate the standard. I really hope you agree with
> me about this.

Hm... you are right.

Please add to your patch code comments with standard snippets and
clarification why it should be so. Commit comment will be often
overseen.

Regards,
Oleksij
-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RESEND] can: j1939: do not wait 250ms if the same addr was already claimed
  2022-11-21  5:19                                         ` Oleksij Rempel
@ 2022-11-23 20:39                                           ` Devid Antonio Filoni
  2022-11-24  5:16                                             ` Oleksij Rempel
  0 siblings, 1 reply; 28+ messages in thread
From: Devid Antonio Filoni @ 2022-11-23 20:39 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Kurt Van Dijck, kbuild test robot, Maxime Jayat, Oliver Hartkopp,
	David Jander, linux-kernel, Oleksij Rempel, netdev,
	Marc Kleine-Budde, kernel, Robin van der Gracht, Jakub Kicinski,
	Paolo Abeni, David S. Miller, linux-can

On Mon, 2022-11-21 at 06:19 +0100, Oleksij Rempel wrote:
> On Sun, Nov 20, 2022 at 08:18:32PM +0100, Devid Antonio Filoni wrote:
> > On Sun, 2022-11-20 at 09:45 +0100, Oleksij Rempel wrote:
> > > On Sun, Nov 20, 2022 at 01:11:52AM +0100, Devid Antonio Filoni wrote:
> > > > On Sat, 2022-11-19 at 11:12 +0100, Oleksij Rempel wrote:
> > > > > On Fri, Nov 18, 2022 at 04:12:40PM +0100, Devid Antonio Filoni wrote:
> > > > > > Hi Oleksij,
> > > > > > 
> > > > > > honestly I would apply proposed patch because it is the easier solution
> > > > > > and makes the driver compliant with the standard for the following
> > > > > > reasons:
> > > > > > - on the first claim, the kernel will wait 250 ms as stated by the
> > > > > > standard
> > > > > > + on successive claims with the same name, the kernel will not wait
> > > > > > 250ms, this implies:
> > > > > >   - it will not wait after sending the address-claimed message when the
> > > > > > claimed address has been spoofed, but the standard does not explicitly
> > > > > > states what to do in this case (see previous emails in this thread), so
> > > > > > it would be up to the application developer to decide how to manage the
> > > > > > conflict
> > > > > >   - it will not wait after sending the address-claimed message when a
> > > > > > request for address-claimed message has been received as stated by the
> > > > > > standard
> > > > > 
> > > > > Standard says:
> > > > > 1. No CF _shall_ begin, or resume, transmission on the network until 250 ms
> > > > >    after it has successfully claimed an address (Figure 4).
> > > > > 2. This does not apply when responding to a request for address claimed.
> > > > > 
> > > > > With current patch state: 1. is implemented and working as expected, 2.
> > > > > is not implemented.
> > > > > With this patch: 1. is partially broken and 2. is partially faking
> > > > > needed behavior.
> > > > > 
> > > > > It will not wait if remote ECU which rebooted for some reasons. With this patch
> > > > > we are breaking one case of the standard in favor to fake compatibility to the
> > > > > other case. We should avoid waiting only based on presence of RfAC not based
> > > > > on the old_addr == new_addr.
> > > > 
> > > > I'm sorry, I don't think I understood the point about reboot ("It will
> > > > not wait if remote ECU which rebooted for some reasons"). If another ECU
> > > > rebooted, then *it* will have to perform the claim procedure again
> > > > waiting 250 ms before beginning the transmission. Your ECU doesn't have
> > > > to check if the other ECUs respected the 250 ms wait.
> > > 
> > > With proposed patch:
> > > - local application which is sending to the remote NAME, will start or continue
> > >   communication with ECU which should stay silent.
> > 
> > And this is not forbidden by the standard, the standard states that the
> > remote ECU shall not start or continue the communication but it can
> > *receive* messages.
> > For example, what would you do if:
> > - during the 250 ms wait, another ECU sends a request-for-address-
> > claimed message meant to the address you're claiming?
> > From "4.5.3 Other requirements for initialization":
> > A CF shall respond to a request-for-address-claimed message when the
> > destination address is the same as the CF's address and shall transmit
> > its response to the Global address (255).
> > - during the 250 ms wait another ECU sends a normal message (non
> > address-claim related) using the SA you're currently claiming?
> > 
> > > - local application which was manually or automatically restarted (see
> > >   application watchdogs), will bypass address claim procedure
> > >   completion and start sending without 250ms delay.
> > 
> > Then the application will be violating the standard, you're right,
> > however please note that, as per driver implementation, each time the
> > socket is closed and opened again (if bound with a name) you have to
> > send the address-claimed message again.
> > The standard also states how to treat this kind of violations on the
> > remote ECU side.
> > 
> > > 
> > > > Also, the ISO11783-5 standard, with "Figure 6 (Resolving address
> > > > contention between two self-configurable-address CF)" of "4.5.4.2 -
> > > > Address-claim prioritization", shows that:
> > > > - ECU1 claims the address (time: 0 ms)
> > > > - ECU2 claims the same address (time: 0+x ms)
> > > > - ECU1 NAME has the higher priority, so ECU1 sends again the address
> > > > claimed message as soon as it received the address-claim from ECU2
> > > > (time: 0+x+y ms)
> > > > - ECU1 starts normal transmission (time: 250 ms)
> > > > With current implementation, the ECU1 would start the transmission at
> > > > time 0+x+y+250 ms, with proposed patch it would not.
> > > 
> > > You are right, this should be fixed.
> > > But proposed patch closes one issues and opens another, with this patch it will
> > > be enough to send at least two address claimed messages to bypass the delay.
> > 
> > No, because the timer associated with the first claim *is not stopped*.
> > 
> > > 
> > > > Same is showed in "Figure 7 (Resolving address contention between a non-
> > > > configurable address CF and a self-configurable address CF)", the ECU
> > > > waits again 250 ms only when claiming a different address.
> > > 
> > > Ack
> > > 
> > > > Also, as previously discussed in this thread, the standard states in
> > > > 4.4.4.3 - Address violation:
> > > > If a CF receives a message, other than the address-claimed message,
> > > > which uses the CF's own SA,
> > > > then the CF:
> > > > - shall send the address-claim message to the Global address;
> > > > - shall activate a diagnostic trouble code with SPN = 2000+SA and FMI =
> > > > 31
> > > > It is not *explicitly* stated that you have to wait 250 ms after the
> > > > address-claim message has been sent.
> > > 
> > > There is no need to explicitly state it. The requirement is clearly described
> > > in the 4.5.2.d part 1 with clearly defined exception in  4.5.2.d part 2.
> > > If something is not explicitly stated, the stated requirement has always
> > > priority.
> > > 
> > > > Please note that the 250 ms wait is  mentioned only in "4.5 - Network
> > > > initialization"
> > > 
> > > OK, we need to refer to the wording used in a specifications, in
> > > general:
> > > Shall – Shall is used to designate a mandatory requirement.
> > > Should – Should is used for requirements that are considered good and are
> > >          recommended, but are not absolutely mandatory.
> > > May – May is used to for requirements that are optional.
> > > 
> > > If a requirement with strong wording as "shall" is not strong enough for
> > > you and you are suing words as ".. mentioned only in .." then even a
> > > statistical analysis of this spec will have no meaning. In all
> > > cases we can just invalidate all arguments by using: it is only X or Y. 
> > > 
> > > > while above statements come from "4.4 - Network-management procedures".
> > > > Also in this case, the proposed patch is still standard compliant.
> > > 
> > > If we remove 4.5.2.d from the spec, then yes.
> > > 
> > > > So I'm sorry but I have to disagree with you, there are many things
> > > > broken in the current implementation because it is forcing the 250 wait
> > > > to all cases but it should not.
> > > 
> > > If we remove 4.5.2.d from the spec, then yes. Every construction is
> > > logical if we adopt input variables to the construction.
> > 
> > From "4.4.4.3 - Address violation":
> > - *shall send the address-claim message* to the Global address
> > From "4.5.2 Address claim requirements":
> > - No CF shall begin, or resume, transmission on the network until 250 ms
> > after it has successfully *claimed an address*, except when responding
> > to a request for address-claimed.
> > 
> > Do you see any difference?
> > With your interpretation of the standard, then above 4.5.2.d sentence
> > shall be:
> > - No CF shall begin, or resume, transmission on the network until 250 ms
> > after it has successfully *sent the address-claim message*, except when
> > responding to a request for address-claimed.
> > 
> > I think "it has successfully claimed an address" is valid for the whole
> > claim procedure and not for the address-claimed message only.
> > 
> > Please note that the ECU shall send the address-claim message also when
> > it receives a request for a matching NAME ("4.4.3.2 NAME management (NM)
> > message"). This does not mean that is claiming again the address.
> > 
> > > 
> > > > > Without words 2. part should be implemented without breaking 1.
> > > > > 
> > > > > > Otherwise you will have to keep track of above cases and decide if the
> > > > > > wait is needed or not, but this is hard do accomplish because is the
> > > > > > application in charge of sending the address-claimed message, so you
> > > > > > would have to decide how much to keep track of the request for address-
> > > > > > claimed message thus adding more complexity to the code of the driver.
> > > > > 
> > > > > Current kernel already tracks all claims on the bus and knows all registered
> > > > > NAMEs. I do not see increased complicity in this case.
> > > > 
> > > > The kernel tracks the claims but it does *not track* incoming requests
> > > > for address-claimed message, it would have to and it would have to
> > > 
> > > yes
> > > 
> > > > allow the application to answer to it *within a defined time window*.
> > > 
> > > yes.
> > > 
> > > > But keep in mind that there are other cases when the 250 ms wait is wrong
> > > > or it is not explicitly stated by the standard.
> > > 
> > > If it is not stated in the standard how can we decide if it is wrong?
> > And how can we decide if it is right? :)
> > 
> > > And if strongly worded statements have no value just because it is
> > > stated only one time, how proper standard should look like? 
> > See above.
> > 
> > > 
> > > > > IMHO, only missing part i a user space interface. Some thing like "ip n"
> > > > > will do.
> > > > > 
> > > > > > Another solution is to let the driver send the address-claimed message
> > > > > > waiting or without waiting 250 ms for successive messages depending on
> > > > > > the case.
> > > > > 
> > > > > You can send "address-claimed message" in any time you wont. Kernel will
> > > > > just not resolve the NAME to address until 1. part of the spec will
> > > > > apply. Do not forget, the NAME cache is used for local _and_ remote
> > > > > names. You can trick out local system, not remote.
> > > > > 
> > > > > Even if you implement "smart" logic in user space and will know better
> > > > > then kernel, that this application is responding to RfAC. You will newer
> > > > > know if address-claimed message of remote system is a response to RfAC.
> > > > > 
> > > > > From this perspective, I do not know, how allowing the user space break
> > > > > the rules will help to solve the problem?
> > > > 
> > > > I think you did not understand this last proposal: since the driver is
> > > > already implementing part of the standard, then it might as well send
> > > > the address-claimed message when needed and wait 250 ms or not depending
> > > > on the case.
> > > 
> > > Let's try following test:
> > > j1939acd -r 80 -c /tmp/1122334455667788.jacd 11223344556677 vcan0 &
> > > while(true); do testj1939 -s8 vcan0:0x80 :0x90,0x12300; done
> > > 
> > > And start candump with delta time stamps:
> > > :~ candump -t d vcan0                                                 
> > >  (000.000000)  vcan0  18EAFFFE   [3]  00 EE 00               
> > >  (000.002437)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF <---- no 250ms delay
> > >  (000.011458)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > >  (000.011964)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > >  (000.011712)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > >  (000.012585)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > >  (000.012891)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > >  (000.012082)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > >  (000.012604)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > >  (000.012357)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > >  (000.012790)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > >  (000.012765)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > >  (000.012483)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > >  (000.012680)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > >  (000.012144)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > > ... snip ...
> > >  (000.012592)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > >  (000.012515)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > >  (000.013183)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > >  (000.012653)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > >  (000.011886)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > >  (000.012836)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > >  (000.009494)  vcan0  18EEFF80   [8]  77 66 55 44 33 22 11 00 <---- SA 0x80 address claimed 
> > >  (000.003362)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF <---- next packet from SA 0x80 3 usecs after previous. No 250ms delay.
> > >  (000.012351)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > >  (000.012983)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > >  (000.012602)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > >  (000.012594)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > >  (000.012348)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > >  (000.011922)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > > 
> > > As you can see, the j1939 stack do not forcing application to use NAMEs and
> > > do not preventing sending any message withing 250ms delay. The only thing
> > > what has the 250 timer is NAME to address resolution which should be fixed in
> > > respect of 4.5.2.d without breaking every thing else.
> > 
> > Yes this is clear, this is working because the socket used by testj1939
> > is not bound to any name.
> > 
> > Just to clarify: are you suggesting to applications developer to use one
> > socket (bound with the name) to manage the address-claim and another one
> > (bound without the name) for other transmissions? If so, then why that
> > code exists in the driver?
> > Honestly I would consider this proposal really bad since this would
> > allow to completely violate the standard. I really hope you agree with
> > me about this.
> 
> Hm... you are right.
> 
> Please add to your patch code comments with standard snippets and
> clarification why it should be so. Commit comment will be often
> overseen.
> 
> Regards,
> Oleksij

Would the following comment be acceptable? Isn't it too long?

The ISO 11783-5 standard, in "4.5.2 - Address claim requirements",
states:
  d) No CF shall begin, or resume, transmission on the network until 250
     ms after it has successfully claimed an address except when
     responding to a request for address-claimed.
But "Figure 6" and "Figure 7" in "4.5.4.2 - Address-claim
prioritization" show that the CF begin the transmission after 250 ms
from the first AC (address-claimed) message even if it sends another AC
message during that time window to resolve the address contention with
another CF.
As stated in "4.4.2.3 - Address-claimed message":
  In order to successfully claim an address, the CF sending an address
  claimed message shall not receive a contending claim from another CF
  for at least 250 ms.
As stated in "4.4.3.2 - NAME management (NM) message":
  1) A commanding CF can
     d) request that a CF with a specified NAME transmit the address-
        claimed message with its current NAME.
  2) A target CF shall
     d) send an address-claimed message in response to a request for a 
        matching NAME
Taking the above arguments into account, the 250 ms wait is requested
only during network initialization.
Do not restart the timer on AC message if both the NAME and the address
match and therefore if the address has already been claimed (timer has
expired) or the AC message has been sent to resolve the contention with
another CF (timer is still running).

Thank you,
Devid


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RESEND] can: j1939: do not wait 250ms if the same addr was already claimed
  2022-11-23 20:39                                           ` Devid Antonio Filoni
@ 2022-11-24  5:16                                             ` Oleksij Rempel
  2022-11-25 17:04                                               ` [PATCH v2] can: j1939: do not wait 250 ms " Devid Antonio Filoni
  0 siblings, 1 reply; 28+ messages in thread
From: Oleksij Rempel @ 2022-11-24  5:16 UTC (permalink / raw)
  To: Devid Antonio Filoni
  Cc: Kurt Van Dijck, kbuild test robot, Maxime Jayat, Oliver Hartkopp,
	David Jander, linux-kernel, Oleksij Rempel, netdev,
	Marc Kleine-Budde, kernel, Robin van der Gracht, Jakub Kicinski,
	Paolo Abeni, David S. Miller, linux-can

On Wed, Nov 23, 2022 at 09:39:06PM +0100, Devid Antonio Filoni wrote:
> On Mon, 2022-11-21 at 06:19 +0100, Oleksij Rempel wrote:
> > On Sun, Nov 20, 2022 at 08:18:32PM +0100, Devid Antonio Filoni wrote:
> > > On Sun, 2022-11-20 at 09:45 +0100, Oleksij Rempel wrote:
> > > > On Sun, Nov 20, 2022 at 01:11:52AM +0100, Devid Antonio Filoni wrote:
> > > > > On Sat, 2022-11-19 at 11:12 +0100, Oleksij Rempel wrote:
> > > > > > On Fri, Nov 18, 2022 at 04:12:40PM +0100, Devid Antonio Filoni wrote:
> > > > > > > Hi Oleksij,
> > > > > > > 
> > > > > > > honestly I would apply proposed patch because it is the easier solution
> > > > > > > and makes the driver compliant with the standard for the following
> > > > > > > reasons:
> > > > > > > - on the first claim, the kernel will wait 250 ms as stated by the
> > > > > > > standard
> > > > > > > + on successive claims with the same name, the kernel will not wait
> > > > > > > 250ms, this implies:
> > > > > > >   - it will not wait after sending the address-claimed message when the
> > > > > > > claimed address has been spoofed, but the standard does not explicitly
> > > > > > > states what to do in this case (see previous emails in this thread), so
> > > > > > > it would be up to the application developer to decide how to manage the
> > > > > > > conflict
> > > > > > >   - it will not wait after sending the address-claimed message when a
> > > > > > > request for address-claimed message has been received as stated by the
> > > > > > > standard
> > > > > > 
> > > > > > Standard says:
> > > > > > 1. No CF _shall_ begin, or resume, transmission on the network until 250 ms
> > > > > >    after it has successfully claimed an address (Figure 4).
> > > > > > 2. This does not apply when responding to a request for address claimed.
> > > > > > 
> > > > > > With current patch state: 1. is implemented and working as expected, 2.
> > > > > > is not implemented.
> > > > > > With this patch: 1. is partially broken and 2. is partially faking
> > > > > > needed behavior.
> > > > > > 
> > > > > > It will not wait if remote ECU which rebooted for some reasons. With this patch
> > > > > > we are breaking one case of the standard in favor to fake compatibility to the
> > > > > > other case. We should avoid waiting only based on presence of RfAC not based
> > > > > > on the old_addr == new_addr.
> > > > > 
> > > > > I'm sorry, I don't think I understood the point about reboot ("It will
> > > > > not wait if remote ECU which rebooted for some reasons"). If another ECU
> > > > > rebooted, then *it* will have to perform the claim procedure again
> > > > > waiting 250 ms before beginning the transmission. Your ECU doesn't have
> > > > > to check if the other ECUs respected the 250 ms wait.
> > > > 
> > > > With proposed patch:
> > > > - local application which is sending to the remote NAME, will start or continue
> > > >   communication with ECU which should stay silent.
> > > 
> > > And this is not forbidden by the standard, the standard states that the
> > > remote ECU shall not start or continue the communication but it can
> > > *receive* messages.
> > > For example, what would you do if:
> > > - during the 250 ms wait, another ECU sends a request-for-address-
> > > claimed message meant to the address you're claiming?
> > > From "4.5.3 Other requirements for initialization":
> > > A CF shall respond to a request-for-address-claimed message when the
> > > destination address is the same as the CF's address and shall transmit
> > > its response to the Global address (255).
> > > - during the 250 ms wait another ECU sends a normal message (non
> > > address-claim related) using the SA you're currently claiming?
> > > 
> > > > - local application which was manually or automatically restarted (see
> > > >   application watchdogs), will bypass address claim procedure
> > > >   completion and start sending without 250ms delay.
> > > 
> > > Then the application will be violating the standard, you're right,
> > > however please note that, as per driver implementation, each time the
> > > socket is closed and opened again (if bound with a name) you have to
> > > send the address-claimed message again.
> > > The standard also states how to treat this kind of violations on the
> > > remote ECU side.
> > > 
> > > > 
> > > > > Also, the ISO11783-5 standard, with "Figure 6 (Resolving address
> > > > > contention between two self-configurable-address CF)" of "4.5.4.2 -
> > > > > Address-claim prioritization", shows that:
> > > > > - ECU1 claims the address (time: 0 ms)
> > > > > - ECU2 claims the same address (time: 0+x ms)
> > > > > - ECU1 NAME has the higher priority, so ECU1 sends again the address
> > > > > claimed message as soon as it received the address-claim from ECU2
> > > > > (time: 0+x+y ms)
> > > > > - ECU1 starts normal transmission (time: 250 ms)
> > > > > With current implementation, the ECU1 would start the transmission at
> > > > > time 0+x+y+250 ms, with proposed patch it would not.
> > > > 
> > > > You are right, this should be fixed.
> > > > But proposed patch closes one issues and opens another, with this patch it will
> > > > be enough to send at least two address claimed messages to bypass the delay.
> > > 
> > > No, because the timer associated with the first claim *is not stopped*.
> > > 
> > > > 
> > > > > Same is showed in "Figure 7 (Resolving address contention between a non-
> > > > > configurable address CF and a self-configurable address CF)", the ECU
> > > > > waits again 250 ms only when claiming a different address.
> > > > 
> > > > Ack
> > > > 
> > > > > Also, as previously discussed in this thread, the standard states in
> > > > > 4.4.4.3 - Address violation:
> > > > > If a CF receives a message, other than the address-claimed message,
> > > > > which uses the CF's own SA,
> > > > > then the CF:
> > > > > - shall send the address-claim message to the Global address;
> > > > > - shall activate a diagnostic trouble code with SPN = 2000+SA and FMI =
> > > > > 31
> > > > > It is not *explicitly* stated that you have to wait 250 ms after the
> > > > > address-claim message has been sent.
> > > > 
> > > > There is no need to explicitly state it. The requirement is clearly described
> > > > in the 4.5.2.d part 1 with clearly defined exception in  4.5.2.d part 2.
> > > > If something is not explicitly stated, the stated requirement has always
> > > > priority.
> > > > 
> > > > > Please note that the 250 ms wait is  mentioned only in "4.5 - Network
> > > > > initialization"
> > > > 
> > > > OK, we need to refer to the wording used in a specifications, in
> > > > general:
> > > > Shall – Shall is used to designate a mandatory requirement.
> > > > Should – Should is used for requirements that are considered good and are
> > > >          recommended, but are not absolutely mandatory.
> > > > May – May is used to for requirements that are optional.
> > > > 
> > > > If a requirement with strong wording as "shall" is not strong enough for
> > > > you and you are suing words as ".. mentioned only in .." then even a
> > > > statistical analysis of this spec will have no meaning. In all
> > > > cases we can just invalidate all arguments by using: it is only X or Y. 
> > > > 
> > > > > while above statements come from "4.4 - Network-management procedures".
> > > > > Also in this case, the proposed patch is still standard compliant.
> > > > 
> > > > If we remove 4.5.2.d from the spec, then yes.
> > > > 
> > > > > So I'm sorry but I have to disagree with you, there are many things
> > > > > broken in the current implementation because it is forcing the 250 wait
> > > > > to all cases but it should not.
> > > > 
> > > > If we remove 4.5.2.d from the spec, then yes. Every construction is
> > > > logical if we adopt input variables to the construction.
> > > 
> > > From "4.4.4.3 - Address violation":
> > > - *shall send the address-claim message* to the Global address
> > > From "4.5.2 Address claim requirements":
> > > - No CF shall begin, or resume, transmission on the network until 250 ms
> > > after it has successfully *claimed an address*, except when responding
> > > to a request for address-claimed.
> > > 
> > > Do you see any difference?
> > > With your interpretation of the standard, then above 4.5.2.d sentence
> > > shall be:
> > > - No CF shall begin, or resume, transmission on the network until 250 ms
> > > after it has successfully *sent the address-claim message*, except when
> > > responding to a request for address-claimed.
> > > 
> > > I think "it has successfully claimed an address" is valid for the whole
> > > claim procedure and not for the address-claimed message only.
> > > 
> > > Please note that the ECU shall send the address-claim message also when
> > > it receives a request for a matching NAME ("4.4.3.2 NAME management (NM)
> > > message"). This does not mean that is claiming again the address.
> > > 
> > > > 
> > > > > > Without words 2. part should be implemented without breaking 1.
> > > > > > 
> > > > > > > Otherwise you will have to keep track of above cases and decide if the
> > > > > > > wait is needed or not, but this is hard do accomplish because is the
> > > > > > > application in charge of sending the address-claimed message, so you
> > > > > > > would have to decide how much to keep track of the request for address-
> > > > > > > claimed message thus adding more complexity to the code of the driver.
> > > > > > 
> > > > > > Current kernel already tracks all claims on the bus and knows all registered
> > > > > > NAMEs. I do not see increased complicity in this case.
> > > > > 
> > > > > The kernel tracks the claims but it does *not track* incoming requests
> > > > > for address-claimed message, it would have to and it would have to
> > > > 
> > > > yes
> > > > 
> > > > > allow the application to answer to it *within a defined time window*.
> > > > 
> > > > yes.
> > > > 
> > > > > But keep in mind that there are other cases when the 250 ms wait is wrong
> > > > > or it is not explicitly stated by the standard.
> > > > 
> > > > If it is not stated in the standard how can we decide if it is wrong?
> > > And how can we decide if it is right? :)
> > > 
> > > > And if strongly worded statements have no value just because it is
> > > > stated only one time, how proper standard should look like? 
> > > See above.
> > > 
> > > > 
> > > > > > IMHO, only missing part i a user space interface. Some thing like "ip n"
> > > > > > will do.
> > > > > > 
> > > > > > > Another solution is to let the driver send the address-claimed message
> > > > > > > waiting or without waiting 250 ms for successive messages depending on
> > > > > > > the case.
> > > > > > 
> > > > > > You can send "address-claimed message" in any time you wont. Kernel will
> > > > > > just not resolve the NAME to address until 1. part of the spec will
> > > > > > apply. Do not forget, the NAME cache is used for local _and_ remote
> > > > > > names. You can trick out local system, not remote.
> > > > > > 
> > > > > > Even if you implement "smart" logic in user space and will know better
> > > > > > then kernel, that this application is responding to RfAC. You will newer
> > > > > > know if address-claimed message of remote system is a response to RfAC.
> > > > > > 
> > > > > > From this perspective, I do not know, how allowing the user space break
> > > > > > the rules will help to solve the problem?
> > > > > 
> > > > > I think you did not understand this last proposal: since the driver is
> > > > > already implementing part of the standard, then it might as well send
> > > > > the address-claimed message when needed and wait 250 ms or not depending
> > > > > on the case.
> > > > 
> > > > Let's try following test:
> > > > j1939acd -r 80 -c /tmp/1122334455667788.jacd 11223344556677 vcan0 &
> > > > while(true); do testj1939 -s8 vcan0:0x80 :0x90,0x12300; done
> > > > 
> > > > And start candump with delta time stamps:
> > > > :~ candump -t d vcan0                                                 
> > > >  (000.000000)  vcan0  18EAFFFE   [3]  00 EE 00               
> > > >  (000.002437)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF <---- no 250ms delay
> > > >  (000.011458)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > > >  (000.011964)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > > >  (000.011712)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > > >  (000.012585)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > > >  (000.012891)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > > >  (000.012082)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > > >  (000.012604)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > > >  (000.012357)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > > >  (000.012790)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > > >  (000.012765)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > > >  (000.012483)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > > >  (000.012680)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > > >  (000.012144)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > > > ... snip ...
> > > >  (000.012592)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > > >  (000.012515)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > > >  (000.013183)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > > >  (000.012653)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > > >  (000.011886)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > > >  (000.012836)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > > >  (000.009494)  vcan0  18EEFF80   [8]  77 66 55 44 33 22 11 00 <---- SA 0x80 address claimed 
> > > >  (000.003362)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF <---- next packet from SA 0x80 3 usecs after previous. No 250ms delay.
> > > >  (000.012351)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > > >  (000.012983)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > > >  (000.012602)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > > >  (000.012594)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > > >  (000.012348)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > > >  (000.011922)  vcan0  19239080   [8]  01 23 45 67 89 AB CD EF
> > > > 
> > > > As you can see, the j1939 stack do not forcing application to use NAMEs and
> > > > do not preventing sending any message withing 250ms delay. The only thing
> > > > what has the 250 timer is NAME to address resolution which should be fixed in
> > > > respect of 4.5.2.d without breaking every thing else.
> > > 
> > > Yes this is clear, this is working because the socket used by testj1939
> > > is not bound to any name.
> > > 
> > > Just to clarify: are you suggesting to applications developer to use one
> > > socket (bound with the name) to manage the address-claim and another one
> > > (bound without the name) for other transmissions? If so, then why that
> > > code exists in the driver?
> > > Honestly I would consider this proposal really bad since this would
> > > allow to completely violate the standard. I really hope you agree with
> > > me about this.
> > 
> > Hm... you are right.
> > 
> > Please add to your patch code comments with standard snippets and
> > clarification why it should be so. Commit comment will be often
> > overseen.
> > 
> > Regards,
> > Oleksij
> 
> Would the following comment be acceptable? Isn't it too long?
> 
> The ISO 11783-5 standard, in "4.5.2 - Address claim requirements",
> states:
>   d) No CF shall begin, or resume, transmission on the network until 250
>      ms after it has successfully claimed an address except when
>      responding to a request for address-claimed.
> But "Figure 6" and "Figure 7" in "4.5.4.2 - Address-claim
> prioritization" show that the CF begin the transmission after 250 ms
> from the first AC (address-claimed) message even if it sends another AC
> message during that time window to resolve the address contention with
> another CF.
> As stated in "4.4.2.3 - Address-claimed message":
>   In order to successfully claim an address, the CF sending an address
>   claimed message shall not receive a contending claim from another CF
>   for at least 250 ms.
> As stated in "4.4.3.2 - NAME management (NM) message":
>   1) A commanding CF can
>      d) request that a CF with a specified NAME transmit the address-
>         claimed message with its current NAME.
>   2) A target CF shall
>      d) send an address-claimed message in response to a request for a 
>         matching NAME
> Taking the above arguments into account, the 250 ms wait is requested
> only during network initialization.
> Do not restart the timer on AC message if both the NAME and the address
> match and therefore if the address has already been claimed (timer has
> expired) or the AC message has been sent to resolve the contention with
> another CF (timer is still running).

Sounds good.

Regards,
Oleksij
-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2] can: j1939: do not wait 250 ms if the same addr was already claimed
  2022-11-24  5:16                                             ` Oleksij Rempel
@ 2022-11-25 17:04                                               ` Devid Antonio Filoni
  2022-11-26 10:28                                                 ` Oleksij Rempel
  0 siblings, 1 reply; 28+ messages in thread
From: Devid Antonio Filoni @ 2022-11-25 17:04 UTC (permalink / raw)
  To: Robin van der Gracht, Oleksij Rempel
  Cc: kernel, Oliver Hartkopp, Marc Kleine-Budde, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-can, netdev,
	linux-kernel, Devid Antonio Filoni

The ISO 11783-5 standard, in "4.5.2 - Address claim requirements", states:
  d) No CF shall begin, or resume, transmission on the network until 250
     ms after it has successfully claimed an address except when
     responding to a request for address-claimed.

But "Figure 6" and "Figure 7" in "4.5.4.2 - Address-claim
prioritization" show that the CF begins the transmission after 250 ms
from the first AC (address-claimed) message even if it sends another AC
message during that time window to resolve the address contention with
another CF.

As stated in "4.4.2.3 - Address-claimed message":
  In order to successfully claim an address, the CF sending an address
  claimed message shall not receive a contending claim from another CF
  for at least 250 ms.

As stated in "4.4.3.2 - NAME management (NM) message":
  1) A commanding CF can
     d) request that a CF with a specified NAME transmit the address-
        claimed message with its current NAME.
  2) A target CF shall
     d) send an address-claimed message in response to a request for a
        matching NAME

Taking the above arguments into account, the 250 ms wait is requested
only during network initialization.

Do not restart the timer on AC message if both the NAME and the address
match and so if the address has already been claimed (timer has expired)
or the AC message has been sent to resolve the contention with another
CF (timer is still running).

Signed-off-by: Devid Antonio Filoni <devid.filoni@egluetechnologies.com>
---
 v1 -> v2: Added ISO 11783-5 standard references

 net/can/j1939/address-claim.c | 40 +++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/net/can/j1939/address-claim.c b/net/can/j1939/address-claim.c
index f33c47327927..ca4ad6cdd5cb 100644
--- a/net/can/j1939/address-claim.c
+++ b/net/can/j1939/address-claim.c
@@ -165,6 +165,46 @@ static void j1939_ac_process(struct j1939_priv *priv, struct sk_buff *skb)
 	 * leaving this function.
 	 */
 	ecu = j1939_ecu_get_by_name_locked(priv, name);
+
+	if (ecu && ecu->addr == skcb->addr.sa) {
+		/* The ISO 11783-5 standard, in "4.5.2 - Address claim
+		 * requirements", states:
+		 *   d) No CF shall begin, or resume, transmission on the
+		 *      network until 250 ms after it has successfully claimed
+		 *      an address except when responding to a request for
+		 *      address-claimed.
+		 *
+		 * But "Figure 6" and "Figure 7" in "4.5.4.2 - Address-claim
+		 * prioritization" show that the CF begins the transmission
+		 * after 250 ms from the first AC (address-claimed) message
+		 * even if it sends another AC message during that time window
+		 * to resolve the address contention with another CF.
+		 *
+		 * As stated in "4.4.2.3 - Address-claimed message":
+		 *   In order to successfully claim an address, the CF sending
+		 *   an address claimed message shall not receive a contending
+		 *   claim from another CF for at least 250 ms.
+		 *
+		 * As stated in "4.4.3.2 - NAME management (NM) message":
+		 *   1) A commanding CF can
+		 *      d) request that a CF with a specified NAME transmit
+		 *         the address-claimed message with its current NAME.
+		 *   2) A target CF shall
+		 *      d) send an address-claimed message in response to a
+		 *         request for a matching NAME
+		 *
+		 * Taking the above arguments into account, the 250 ms wait is
+		 * requested only during network initialization.
+		 *
+		 * Do not restart the timer on AC message if both the NAME and
+		 * the address match and so if the address has already been
+		 * claimed (timer has expired) or the AC message has been sent
+		 * to resolve the contention with another CF (timer is still
+		 * running).
+		 */
+		goto out_ecu_put;
+	}
+
 	if (!ecu && j1939_address_is_unicast(skcb->addr.sa))
 		ecu = j1939_ecu_create_locked(priv, name);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] can: j1939: do not wait 250 ms if the same addr was already claimed
  2022-11-25 17:04                                               ` [PATCH v2] can: j1939: do not wait 250 ms " Devid Antonio Filoni
@ 2022-11-26 10:28                                                 ` Oleksij Rempel
  2023-02-07 13:50                                                   ` Devid Antonio Filoni
  0 siblings, 1 reply; 28+ messages in thread
From: Oleksij Rempel @ 2022-11-26 10:28 UTC (permalink / raw)
  To: Devid Antonio Filoni
  Cc: Robin van der Gracht, Oleksij Rempel, kernel, Oliver Hartkopp,
	Marc Kleine-Budde, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, linux-can, netdev, linux-kernel

On Fri, Nov 25, 2022 at 06:04:18PM +0100, Devid Antonio Filoni wrote:
> The ISO 11783-5 standard, in "4.5.2 - Address claim requirements", states:
>   d) No CF shall begin, or resume, transmission on the network until 250
>      ms after it has successfully claimed an address except when
>      responding to a request for address-claimed.
> 
> But "Figure 6" and "Figure 7" in "4.5.4.2 - Address-claim
> prioritization" show that the CF begins the transmission after 250 ms
> from the first AC (address-claimed) message even if it sends another AC
> message during that time window to resolve the address contention with
> another CF.
> 
> As stated in "4.4.2.3 - Address-claimed message":
>   In order to successfully claim an address, the CF sending an address
>   claimed message shall not receive a contending claim from another CF
>   for at least 250 ms.
> 
> As stated in "4.4.3.2 - NAME management (NM) message":
>   1) A commanding CF can
>      d) request that a CF with a specified NAME transmit the address-
>         claimed message with its current NAME.
>   2) A target CF shall
>      d) send an address-claimed message in response to a request for a
>         matching NAME
> 
> Taking the above arguments into account, the 250 ms wait is requested
> only during network initialization.
> 
> Do not restart the timer on AC message if both the NAME and the address
> match and so if the address has already been claimed (timer has expired)
> or the AC message has been sent to resolve the contention with another
> CF (timer is still running).
> 
> Signed-off-by: Devid Antonio Filoni <devid.filoni@egluetechnologies.com>

Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>

> ---
>  v1 -> v2: Added ISO 11783-5 standard references
> 
>  net/can/j1939/address-claim.c | 40 +++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
> 
> diff --git a/net/can/j1939/address-claim.c b/net/can/j1939/address-claim.c
> index f33c47327927..ca4ad6cdd5cb 100644
> --- a/net/can/j1939/address-claim.c
> +++ b/net/can/j1939/address-claim.c
> @@ -165,6 +165,46 @@ static void j1939_ac_process(struct j1939_priv *priv, struct sk_buff *skb)
>  	 * leaving this function.
>  	 */
>  	ecu = j1939_ecu_get_by_name_locked(priv, name);
> +
> +	if (ecu && ecu->addr == skcb->addr.sa) {
> +		/* The ISO 11783-5 standard, in "4.5.2 - Address claim
> +		 * requirements", states:
> +		 *   d) No CF shall begin, or resume, transmission on the
> +		 *      network until 250 ms after it has successfully claimed
> +		 *      an address except when responding to a request for
> +		 *      address-claimed.
> +		 *
> +		 * But "Figure 6" and "Figure 7" in "4.5.4.2 - Address-claim
> +		 * prioritization" show that the CF begins the transmission
> +		 * after 250 ms from the first AC (address-claimed) message
> +		 * even if it sends another AC message during that time window
> +		 * to resolve the address contention with another CF.
> +		 *
> +		 * As stated in "4.4.2.3 - Address-claimed message":
> +		 *   In order to successfully claim an address, the CF sending
> +		 *   an address claimed message shall not receive a contending
> +		 *   claim from another CF for at least 250 ms.
> +		 *
> +		 * As stated in "4.4.3.2 - NAME management (NM) message":
> +		 *   1) A commanding CF can
> +		 *      d) request that a CF with a specified NAME transmit
> +		 *         the address-claimed message with its current NAME.
> +		 *   2) A target CF shall
> +		 *      d) send an address-claimed message in response to a
> +		 *         request for a matching NAME
> +		 *
> +		 * Taking the above arguments into account, the 250 ms wait is
> +		 * requested only during network initialization.
> +		 *
> +		 * Do not restart the timer on AC message if both the NAME and
> +		 * the address match and so if the address has already been
> +		 * claimed (timer has expired) or the AC message has been sent
> +		 * to resolve the contention with another CF (timer is still
> +		 * running).
> +		 */
> +		goto out_ecu_put;
> +	}
> +
>  	if (!ecu && j1939_address_is_unicast(skcb->addr.sa))
>  		ecu = j1939_ecu_create_locked(priv, name);
>  
> -- 
> 2.34.1
> 
> 

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] can: j1939: do not wait 250 ms if the same addr was already claimed
  2022-11-26 10:28                                                 ` Oleksij Rempel
@ 2023-02-07 13:50                                                   ` Devid Antonio Filoni
  2023-02-07 14:05                                                     ` Marc Kleine-Budde
  0 siblings, 1 reply; 28+ messages in thread
From: Devid Antonio Filoni @ 2023-02-07 13:50 UTC (permalink / raw)
  To: Oleksij Rempel, Robin van der Gracht, Oliver Hartkopp, Marc Kleine-Budde
  Cc: Oleksij Rempel, kernel, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-can, netdev, linux-kernel

On Sat, 2022-11-26 at 11:28 +0100, Oleksij Rempel wrote:
> On Fri, Nov 25, 2022 at 06:04:18PM +0100, Devid Antonio Filoni wrote:
> > The ISO 11783-5 standard, in "4.5.2 - Address claim requirements", states:
> >   d) No CF shall begin, or resume, transmission on the network until 250
> >      ms after it has successfully claimed an address except when
> >      responding to a request for address-claimed.
> > 
> > But "Figure 6" and "Figure 7" in "4.5.4.2 - Address-claim
> > prioritization" show that the CF begins the transmission after 250 ms
> > from the first AC (address-claimed) message even if it sends another AC
> > message during that time window to resolve the address contention with
> > another CF.
> > 
> > As stated in "4.4.2.3 - Address-claimed message":
> >   In order to successfully claim an address, the CF sending an address
> >   claimed message shall not receive a contending claim from another CF
> >   for at least 250 ms.
> > 
> > As stated in "4.4.3.2 - NAME management (NM) message":
> >   1) A commanding CF can
> >      d) request that a CF with a specified NAME transmit the address-
> >         claimed message with its current NAME.
> >   2) A target CF shall
> >      d) send an address-claimed message in response to a request for a
> >         matching NAME
> > 
> > Taking the above arguments into account, the 250 ms wait is requested
> > only during network initialization.
> > 
> > Do not restart the timer on AC message if both the NAME and the address
> > match and so if the address has already been claimed (timer has expired)
> > or the AC message has been sent to resolve the contention with another
> > CF (timer is still running).
> > 
> > Signed-off-by: Devid Antonio Filoni <devid.filoni@egluetechnologies.com>
> 
> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
> 
> > ---
> >  v1 -> v2: Added ISO 11783-5 standard references
> > 
> >  net/can/j1939/address-claim.c | 40 +++++++++++++++++++++++++++++++++++
> >  1 file changed, 40 insertions(+)
> > 
> > diff --git a/net/can/j1939/address-claim.c b/net/can/j1939/address-claim.c
> > index f33c47327927..ca4ad6cdd5cb 100644
> > --- a/net/can/j1939/address-claim.c
> > +++ b/net/can/j1939/address-claim.c
> > @@ -165,6 +165,46 @@ static void j1939_ac_process(struct j1939_priv *priv, struct sk_buff *skb)
> >  	 * leaving this function.
> >  	 */
> >  	ecu = j1939_ecu_get_by_name_locked(priv, name);
> > +
> > +	if (ecu && ecu->addr == skcb->addr.sa) {
> > +		/* The ISO 11783-5 standard, in "4.5.2 - Address claim
> > +		 * requirements", states:
> > +		 *   d) No CF shall begin, or resume, transmission on the
> > +		 *      network until 250 ms after it has successfully claimed
> > +		 *      an address except when responding to a request for
> > +		 *      address-claimed.
> > +		 *
> > +		 * But "Figure 6" and "Figure 7" in "4.5.4.2 - Address-claim
> > +		 * prioritization" show that the CF begins the transmission
> > +		 * after 250 ms from the first AC (address-claimed) message
> > +		 * even if it sends another AC message during that time window
> > +		 * to resolve the address contention with another CF.
> > +		 *
> > +		 * As stated in "4.4.2.3 - Address-claimed message":
> > +		 *   In order to successfully claim an address, the CF sending
> > +		 *   an address claimed message shall not receive a contending
> > +		 *   claim from another CF for at least 250 ms.
> > +		 *
> > +		 * As stated in "4.4.3.2 - NAME management (NM) message":
> > +		 *   1) A commanding CF can
> > +		 *      d) request that a CF with a specified NAME transmit
> > +		 *         the address-claimed message with its current NAME.
> > +		 *   2) A target CF shall
> > +		 *      d) send an address-claimed message in response to a
> > +		 *         request for a matching NAME
> > +		 *
> > +		 * Taking the above arguments into account, the 250 ms wait is
> > +		 * requested only during network initialization.
> > +		 *
> > +		 * Do not restart the timer on AC message if both the NAME and
> > +		 * the address match and so if the address has already been
> > +		 * claimed (timer has expired) or the AC message has been sent
> > +		 * to resolve the contention with another CF (timer is still
> > +		 * running).
> > +		 */
> > +		goto out_ecu_put;
> > +	}
> > +
> >  	if (!ecu && j1939_address_is_unicast(skcb->addr.sa))
> >  		ecu = j1939_ecu_create_locked(priv, name);
> >  
> > -- 
> > 2.34.1
> > 
> > 
> 

Hello,
I noticed that this patch has not been integrated in upstream yet. Are
there problems with it?

Thank you,
Devid

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] can: j1939: do not wait 250 ms if the same addr was already claimed
  2023-02-07 13:50                                                   ` Devid Antonio Filoni
@ 2023-02-07 14:05                                                     ` Marc Kleine-Budde
  0 siblings, 0 replies; 28+ messages in thread
From: Marc Kleine-Budde @ 2023-02-07 14:05 UTC (permalink / raw)
  To: Devid Antonio Filoni
  Cc: Oleksij Rempel, Robin van der Gracht, Oliver Hartkopp,
	Oleksij Rempel, kernel, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-can, netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 484 bytes --]

On 07.02.2023 14:50:15, Devid Antonio Filoni wrote:
[...]
> I noticed that this patch has not been integrated in upstream yet. Are
> there problems with it?

Thanks for the heads up, I've send a PR.

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2023-02-07 14:06 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-09 17:03 [PATCH RESEND] can: j1939: do not wait 250ms if the same addr was already claimed Devid Antonio Filoni
2022-05-09 19:04 ` Kurt Van Dijck
2022-05-10  4:26   ` Oleksij Rempel
2022-05-10 11:00     ` Devid Antonio Filoni
2022-05-11  8:47       ` Oleksij Rempel
2022-05-11  9:06         ` David Jander
2022-05-11 12:55           ` Devid Antonio Filoni
2022-05-11 14:22             ` David Jander
2022-05-13  9:46               ` Devid Antonio Filoni
2022-11-17 14:08                 ` Devid Antonio Filoni
2022-11-17 15:22                   ` David Jander
2022-11-18  6:06                     ` Oleksij Rempel
2022-11-18 10:25                       ` Devid Antonio Filoni
2022-11-18 12:30                         ` Oleksij Rempel
2022-11-18 12:41                           ` Devid Antonio Filoni
2022-11-18 13:44                             ` Oleksij Rempel
2022-11-18 15:12                               ` Devid Antonio Filoni
2022-11-19 10:12                                 ` Oleksij Rempel
2022-11-20  0:11                                   ` Devid Antonio Filoni
2022-11-20  8:45                                     ` Oleksij Rempel
2022-11-20 19:18                                       ` Devid Antonio Filoni
2022-11-21  5:19                                         ` Oleksij Rempel
2022-11-23 20:39                                           ` Devid Antonio Filoni
2022-11-24  5:16                                             ` Oleksij Rempel
2022-11-25 17:04                                               ` [PATCH v2] can: j1939: do not wait 250 ms " Devid Antonio Filoni
2022-11-26 10:28                                                 ` Oleksij Rempel
2023-02-07 13:50                                                   ` Devid Antonio Filoni
2023-02-07 14:05                                                     ` Marc Kleine-Budde

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.