All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] can: bittiming: improve SJW handling and default
@ 2022-09-07 10:38 Marc Kleine-Budde
  2022-09-07 10:38 ` [PATCH 1/5] can: dev: register_candev(): ensure that bittiming const are valid Marc Kleine-Budde
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Marc Kleine-Budde @ 2022-09-07 10:38 UTC (permalink / raw)
  To: linux-can; +Cc: Mark Bath

Hello,

several people noticed that on modern CAN controllers with wide bit
timing registers the default SJW of 1 can result in unstable or no
synchronization to the CAN network. See Patch 5/5 for details.

This series fixes the issue by first cleaning up the SJW handling,
making it more standard compliant (SJW must not be longer than any
Phase Seg - not just Phase Seg2) and finally using Phase Seg2 / 2 as
default, if the user space provided no SJW configuration.

regards,
Marc




^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/5] can: dev: register_candev(): ensure that bittiming const are valid
  2022-09-07 10:38 [PATCH 0/5] can: bittiming: improve SJW handling and default Marc Kleine-Budde
@ 2022-09-07 10:38 ` Marc Kleine-Budde
  2022-09-07 10:38 ` [PATCH 2/5] can: bittiming: can_calc_bittiming(): make use of min3() Marc Kleine-Budde
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Marc Kleine-Budde @ 2022-09-07 10:38 UTC (permalink / raw)
  To: linux-can; +Cc: Mark Bath, Marc Kleine-Budde

Implement a function can_bittiming_const_valid() to check the validity
of the provided bit timing constant. Call this function from
register_candev() to check the bit timing constants during CAN
interface registration.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 drivers/net/can/dev/dev.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/net/can/dev/dev.c b/drivers/net/can/dev/dev.c
index c1956b1e9faf..3b51055be40e 100644
--- a/drivers/net/can/dev/dev.c
+++ b/drivers/net/can/dev/dev.c
@@ -498,6 +498,18 @@ static int can_get_termination(struct net_device *ndev)
 	return 0;
 }
 
+static bool
+can_bittiming_const_valid(const struct can_bittiming_const *btc)
+{
+	if (!btc)
+		return true;
+
+	if (!btc->sjw_max)
+		return false;
+
+	return true;
+}
+
 /* Register the CAN network device */
 int register_candev(struct net_device *dev)
 {
@@ -518,6 +530,10 @@ int register_candev(struct net_device *dev)
 	if (!priv->data_bitrate_const != !priv->data_bitrate_const_cnt)
 		return -EINVAL;
 
+	if (!can_bittiming_const_valid(priv->bittiming_const) ||
+	    !can_bittiming_const_valid(priv->data_bittiming_const))
+		return -EINVAL;
+
 	if (!priv->termination_const) {
 		err = can_get_termination(dev);
 		if (err)
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/5] can: bittiming: can_calc_bittiming(): make use of min3()
  2022-09-07 10:38 [PATCH 0/5] can: bittiming: improve SJW handling and default Marc Kleine-Budde
  2022-09-07 10:38 ` [PATCH 1/5] can: dev: register_candev(): ensure that bittiming const are valid Marc Kleine-Budde
@ 2022-09-07 10:38 ` Marc Kleine-Budde
  2022-09-11  8:24   ` Vincent Mailhol
  2022-09-07 10:38 ` [PATCH 3/5] can: bittiming: can_calc_bittiming(): clean up SJW sanitizing Marc Kleine-Budde
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Marc Kleine-Budde @ 2022-09-07 10:38 UTC (permalink / raw)
  To: linux-can; +Cc: Mark Bath, Marc Kleine-Budde

In can_calc_bittiming() there are several open coded checks to ensure
that SJW is within certain limits. Replace this by a single call to
min3().

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 drivers/net/can/dev/calc_bittiming.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/net/can/dev/calc_bittiming.c b/drivers/net/can/dev/calc_bittiming.c
index d3caa040614d..ce6bef2444a2 100644
--- a/drivers/net/can/dev/calc_bittiming.c
+++ b/drivers/net/can/dev/calc_bittiming.c
@@ -158,12 +158,8 @@ int can_calc_bittiming(const struct net_device *dev, struct can_bittiming *bt,
 	if (!bt->sjw || !btc->sjw_max) {
 		bt->sjw = 1;
 	} else {
-		/* bt->sjw is at least 1 -> sanitize upper bound to sjw_max */
-		if (bt->sjw > btc->sjw_max)
-			bt->sjw = btc->sjw_max;
-		/* bt->sjw must not be higher than tseg2 */
-		if (tseg2 < bt->sjw)
-			bt->sjw = tseg2;
+		/* sjw must not be higher than sjw_max and tseg2 */
+		bt->sjw = min3(bt->sjw, btc->sjw_max, tseg2);
 	}
 
 	bt->brp = best_brp;
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/5] can: bittiming: can_calc_bittiming(): clean up SJW sanitizing
  2022-09-07 10:38 [PATCH 0/5] can: bittiming: improve SJW handling and default Marc Kleine-Budde
  2022-09-07 10:38 ` [PATCH 1/5] can: dev: register_candev(): ensure that bittiming const are valid Marc Kleine-Budde
  2022-09-07 10:38 ` [PATCH 2/5] can: bittiming: can_calc_bittiming(): make use of min3() Marc Kleine-Budde
@ 2022-09-07 10:38 ` Marc Kleine-Budde
  2022-09-07 10:38 ` [PATCH 4/5] can: bittiming: can_calc_bittiming(): ensure that SJW is not longer than either Phase Buffer Segment Marc Kleine-Budde
  2022-09-07 10:38 ` [PATCH 5/5] can: bittiming: can_calc_bittiming(): use Phase Seg2 / 2 as default for SJW Marc Kleine-Budde
  4 siblings, 0 replies; 20+ messages in thread
From: Marc Kleine-Budde @ 2022-09-07 10:38 UTC (permalink / raw)
  To: linux-can; +Cc: Mark Bath, Marc Kleine-Budde

Prepare the SJW handling in can_calc_bittiming() for the changes
coming in the next patches. Always ensure that SJW in within the
limits.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 drivers/net/can/dev/calc_bittiming.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/can/dev/calc_bittiming.c b/drivers/net/can/dev/calc_bittiming.c
index ce6bef2444a2..cb9521c8ae8e 100644
--- a/drivers/net/can/dev/calc_bittiming.c
+++ b/drivers/net/can/dev/calc_bittiming.c
@@ -154,13 +154,12 @@ int can_calc_bittiming(const struct net_device *dev, struct can_bittiming *bt,
 	bt->phase_seg1 = tseg1 - bt->prop_seg;
 	bt->phase_seg2 = tseg2;
 
-	/* check for sjw user settings */
-	if (!bt->sjw || !btc->sjw_max) {
+	/* If user space provides no sjw, use 1 as default */
+	if (!bt->sjw)
 		bt->sjw = 1;
-	} else {
-		/* sjw must not be higher than sjw_max and tseg2 */
-		bt->sjw = min3(bt->sjw, btc->sjw_max, tseg2);
-	}
+
+	/* sjw must not be higher than sjw_max and tseg2 */
+	bt->sjw = min3(bt->sjw, btc->sjw_max, tseg2);
 
 	bt->brp = best_brp;
 
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/5] can: bittiming: can_calc_bittiming(): ensure that SJW is not longer than either Phase Buffer Segment
  2022-09-07 10:38 [PATCH 0/5] can: bittiming: improve SJW handling and default Marc Kleine-Budde
                   ` (2 preceding siblings ...)
  2022-09-07 10:38 ` [PATCH 3/5] can: bittiming: can_calc_bittiming(): clean up SJW sanitizing Marc Kleine-Budde
@ 2022-09-07 10:38 ` Marc Kleine-Budde
  2022-09-07 10:38 ` [PATCH 5/5] can: bittiming: can_calc_bittiming(): use Phase Seg2 / 2 as default for SJW Marc Kleine-Budde
  4 siblings, 0 replies; 20+ messages in thread
From: Marc Kleine-Budde @ 2022-09-07 10:38 UTC (permalink / raw)
  To: linux-can; +Cc: Mark Bath, Marc Kleine-Budde

According to "The Configuration of the CAN Bit Timing" [1] the SJW
"may not be longer than either Phase Buffer Segment".

Use bt->phase_seg2 instead of tseg2 (technically these are the same
values). Include bt->phase_seg1 in the limit check, too.

[1] http://web.archive.org/http://www.oertel-halle.de/files/cia99paper.pdf

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 drivers/net/can/dev/calc_bittiming.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/can/dev/calc_bittiming.c b/drivers/net/can/dev/calc_bittiming.c
index cb9521c8ae8e..b9aa1f7d0b37 100644
--- a/drivers/net/can/dev/calc_bittiming.c
+++ b/drivers/net/can/dev/calc_bittiming.c
@@ -158,8 +158,9 @@ int can_calc_bittiming(const struct net_device *dev, struct can_bittiming *bt,
 	if (!bt->sjw)
 		bt->sjw = 1;
 
-	/* sjw must not be higher than sjw_max and tseg2 */
-	bt->sjw = min3(bt->sjw, btc->sjw_max, tseg2);
+	/* sjw must not be higher than sjw_max, phase_seg1, and phase_seg2 */
+	bt->sjw = min3(bt->sjw, btc->sjw_max,
+		       min(bt->phase_seg1, bt->phase_seg2));
 
 	bt->brp = best_brp;
 
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/5] can: bittiming: can_calc_bittiming(): use Phase Seg2 / 2 as default for SJW
  2022-09-07 10:38 [PATCH 0/5] can: bittiming: improve SJW handling and default Marc Kleine-Budde
                   ` (3 preceding siblings ...)
  2022-09-07 10:38 ` [PATCH 4/5] can: bittiming: can_calc_bittiming(): ensure that SJW is not longer than either Phase Buffer Segment Marc Kleine-Budde
@ 2022-09-07 10:38 ` Marc Kleine-Budde
  2022-09-07 19:19   ` Thomas.Kopp
  4 siblings, 1 reply; 20+ messages in thread
From: Marc Kleine-Budde @ 2022-09-07 10:38 UTC (permalink / raw)
  To: linux-can; +Cc: Mark Bath, Marc Kleine-Budde

"The (Re-)Synchronization Jump Width (SJW) defines how far a
 resynchronization may move the Sample Point inside the limits defined
 by the Phase Buffer Segments to compensate for edge phase errors." [1]

In other words this means the SJW parameter controls the tolerance of
the CAN controller against the frequency error compared to other CAN
controllers.

If the user space doesn't provide a SJW parameter, the kernel chooses
a default value of 1. This proved to be a good default value for CAN
controllers, but not anymore with modern controllers.

In the past, there were CAN controllers like the sja1000 with a rather
limited range of bit timing parameters. For the standard bit rates
this results in the following bit timing parameters:

| Bit timing parameters for sja1000 with 8.000000 MHz ref clock
|                     _----+--------------=> tseg1: 1 …   16
|                    /    /     _---------=> tseg2: 1 …    8
|                   |    |     /    _-----=> sjw:   1 …    4
|                   |    |    |    /    _-=> brp:   1 …   64 (inc: 1)
|                   |    |    |   |    /
|  nominal          |    |    |   |   |     real  Bitrt    nom   real   SampP
|  Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP  Bitrate  Error  SampP  SampP   Error  BTR0 BTR1
|  1000000    125   2    3    2   1   1  1000000   0.0%  75.0%  75.0%   0.0%   0x00 0x14
|   800000    125   3    4    2   1   1   800000   0.0%  80.0%  80.0%   0.0%   0x00 0x16
|   666666    125   4    4    3   1   1   666666   0.0%  80.0%  75.0%   6.2%   0x00 0x27
|   500000    125   6    7    2   1   1   500000   0.0%  87.5%  87.5%   0.0%   0x00 0x1c
|   250000    250   6    7    2   1   2   250000   0.0%  87.5%  87.5%   0.0%   0x01 0x1c
|   125000    500   6    7    2   1   4   125000   0.0%  87.5%  87.5%   0.0%   0x03 0x1c
|   100000    625   6    7    2   1   5   100000   0.0%  87.5%  87.5%   0.0%   0x04 0x1c
|    83333    750   6    7    2   1   6    83333   0.0%  87.5%  87.5%   0.0%   0x05 0x1c
|    50000   1250   6    7    2   1  10    50000   0.0%  87.5%  87.5%   0.0%   0x09 0x1c
|    33333   1875   6    7    2   1  15    33333   0.0%  87.5%  87.5%   0.0%   0x0e 0x1c
|    20000   3125   6    7    2   1  25    20000   0.0%  87.5%  87.5%   0.0%   0x18 0x1c
|    10000   6250   6    7    2   1  50    10000   0.0%  87.5%  87.5%   0.0%   0x31 0x1c

The attentive reader notices that in most cases the SJW is 1, while
the Phase Seg2 is 2. Both values are in TQ units, which itself is a
duration in nanoseconds.

For example the 500 kbit/s configuration:

|  nominal                                  real  Bitrt    nom   real   SampP
|  Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP  Bitrate  Error  SampP  SampP   Error  BTR0 BTR1
|   500000    125   6    7    2   1   1   500000   0.0%  87.5%  87.5%   0.0%   0x00 0x1c

the TQ is 125ns, the Phase Seg2 is "2" (== 250ns), the SJW is "1" (==
125 ns).

Looking at a more modern CAN controller like a mcp2518fd, it has wider
bit timing registers.

| Bit timing parameters for mcp251xfd with 40.000000 MHz ref clock
|                     _----+--------------=> tseg1: 2 …  256
|                    /    /     _---------=> tseg2: 1 …  128
|                   |    |     /    _-----=> sjw:   1 …  128
|                   |    |    |    /    _-=> brp:   1 …  256 (inc: 1)
|                   |    |    |   |    /
|  nominal          |    |    |   |   |     real  Bitrt    nom   real   SampP
|  Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP  Bitrate  Error  SampP  SampP   Error      NBTCFG
|   500000     25  34   35   10   1   1   500000   0.0%  87.5%  87.5%   0.0%   0x00440900

The TQ is 25ns, the Phase Seg 2 is "10" (== 250ns), the SJW is "1" (==
25ns).

As the kernel chooses a default SJW of 1 independent of the TQ, this
results in a much smaller SJW and thus much smaller tolerances against
frequency errors.

To get the same oscillator tolerances on controllers with wide bit
timing registers, choose a default SJW value of Phase Seg2 / 2. This
results in the following bit timing parameters:

| Bit timing parameters for mcp251xfd with 40.000000 MHz ref clock
|                     _----+--------------=> tseg1: 2 …  256
|                    /    /     _---------=> tseg2: 1 …  128
|                   |    |     /    _-----=> sjw:   1 …  128
|                   |    |    |    /    _-=> brp:   1 …  256 (inc: 1)
|                   |    |    |   |    /
|  nominal          |    |    |   |   |     real  Bitrt    nom   real   SampP
|  Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP  Bitrate  Error  SampP  SampP   Error      NBTCFG
|   500000     25  34   35   10   5   1   500000   0.0%  87.5%  87.5%   0.0%   0x00440904

The TQ is 25ns, the Phase Seg 2 is "10" (== 250ns), the SJW is "5" (==
125ns). Which is the same as on the sja1000 controller.

[1] http://web.archive.org/http://www.oertel-halle.de/files/cia99paper.pdf

Cc: Mark Bath <mark@baggywrinkle.co.uk>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 drivers/net/can/dev/calc_bittiming.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/can/dev/calc_bittiming.c b/drivers/net/can/dev/calc_bittiming.c
index b9aa1f7d0b37..6b4ca7af05fa 100644
--- a/drivers/net/can/dev/calc_bittiming.c
+++ b/drivers/net/can/dev/calc_bittiming.c
@@ -154,9 +154,9 @@ int can_calc_bittiming(const struct net_device *dev, struct can_bittiming *bt,
 	bt->phase_seg1 = tseg1 - bt->prop_seg;
 	bt->phase_seg2 = tseg2;
 
-	/* If user space provides no sjw, use 1 as default */
+	/* If user space provides no sjw, use sane default of phase_seg2 / 2 */
 	if (!bt->sjw)
-		bt->sjw = 1;
+		bt->sjw = max(1U, bt->phase_seg2 / 2);
 
 	/* sjw must not be higher than sjw_max, phase_seg1, and phase_seg2 */
 	bt->sjw = min3(bt->sjw, btc->sjw_max,
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* RE: [PATCH 5/5] can: bittiming: can_calc_bittiming(): use Phase Seg2 / 2 as default for SJW
  2022-09-07 10:38 ` [PATCH 5/5] can: bittiming: can_calc_bittiming(): use Phase Seg2 / 2 as default for SJW Marc Kleine-Budde
@ 2022-09-07 19:19   ` Thomas.Kopp
  2022-09-08 19:57     ` Marc Kleine-Budde
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas.Kopp @ 2022-09-07 19:19 UTC (permalink / raw)
  To: mkl, linux-can; +Cc: mark

Hi Marc,

> In other words this means the SJW parameter controls the tolerance of
> the CAN controller against the frequency error compared to other CAN
> controllers.
> 
> If the user space doesn't provide a SJW parameter, the kernel chooses
> a default value of 1. This proved to be a good default value for CAN
> controllers, but not anymore with modern controllers.
> 
> 
> To get the same oscillator tolerances on controllers with wide bit
> timing registers, choose a default SJW value of Phase Seg2 / 2. This
> results in the following bit timing parameters:


Thanks for your work on this! May I ask why Phase Seg2 / 2 is chosen?

In general the recommendation is to chose SJW as big as possible. Here's an excerpt from the CiA 601-3 (Unfortunately I can't share the entire document) 
"Recom2: Choose sjwA as large as possible. 
- The maximum possible value is sjwA = min(ps1A, ps2A). 
- During arbitration phase, a large sjwA allows a CAN node to resynchronize quickly on the leading transmitting node. 
- If the CAN controller does not allow configuring the maximum possible sjwA 
because the value range for sjwA is limited in the CAN controller, then consider 
increasing BRPA. This allows configuring a larger sjwA, unless this can impact 
phase error tolerance in an unwanted way, see Recom4."

I think we discussed this a while ago already and the biggest reasoning against maxing SJW was breaking "compatibility" to old kernel versions. The compatibility in this is not being able to sync though...
Now, if a change to a larger default values is agreed upon we might as well chose min(phaseseg1, phaseseg2) directly to maximize robustness for the user that doesn't set it explicitly. Given that most of the getting started with Socketcan results on your favorite Search engine don't even mention SJW that's probably a significant portion of the users. From a pure communication standpoint I can't see any downside to doing this, the only hypothetical case would be if a CAN controller had an erratum about higher SJWs and the driver not handling it.

So, in essence, I'd propose to use min(p1,p2).

Best Regards,
Thomas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] can: bittiming: can_calc_bittiming(): use Phase Seg2 / 2 as default for SJW
  2022-09-07 19:19   ` Thomas.Kopp
@ 2022-09-08 19:57     ` Marc Kleine-Budde
  2022-09-09 12:29       ` Marc Kleine-Budde
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Kleine-Budde @ 2022-09-08 19:57 UTC (permalink / raw)
  To: Thomas.Kopp; +Cc: linux-can, mark

[-- Attachment #1: Type: text/plain, Size: 12627 bytes --]

Hello Thomas,

On 07.09.2022 19:19:42, Thomas.Kopp@microchip.com wrote:
> > In other words this means the SJW parameter controls the tolerance of
> > the CAN controller against the frequency error compared to other CAN
> > controllers.
> >
> > If the user space doesn't provide a SJW parameter, the kernel chooses
> > a default value of 1. This proved to be a good default value for CAN
> > controllers, but not anymore with modern controllers.
> >
> > To get the same oscillator tolerances on controllers with wide bit
> > timing registers, choose a default SJW value of Phase Seg2 / 2. This
> > results in the following bit timing parameters:
>
> Thanks for your work on this! May I ask why Phase Seg2 / 2 is chosen?

According to "Combining CANopen and SAE J1939 networks" [2], CANopen and
J1939 use a SJW of 1:

"The CANopen standard CiA 301 allows various bitrates in the range from
 10 kBit/s to 1 MBit/s. The J1939-11 specification stipulates 250 kBit/s
 and is used in the majority of applications. The J1939-14 standard
 specifies 500 kBit/s for the physical layer.

 Thereby the bitrate for a shared physical layer is limited to the
 bitrates 250 kBit/s and 500 kBit/s. Fortunately both standards define
 the same sample point location at 87,5 % together with a SJW value of
 1."

I think this is why we implemented a default SJW of 1 in the kernel.

As discussed in my patch description a SJW of 1 for controllers that
were common at the time these standard were written, with narrow bit
timing registers, equals "Phase Seg2 / 2".

In "The Configuration of the CAN Bit Timing" [3] the following criteria
for the Oscillator Tolerance Range was given:

"The tolerance range df for an oscillator’s frequency f_osc around the
 nominal frequency f_nom with:

 (1 - df) * f_nom <= f_osc <= (1 + df) * f_nom

 depends on the proportions of Phase_Seg1, Phase_Seg2, SJW, and the bit
 time. The maximum tolerance df is the defined by two conditions (both
 shall be met):

              min(Phase_Seg1, Phase_Seg2)
 I:  df <= --------------------------------
           2 * (13 * bit_time - Phase_Seg2)

                SJW
 II: df <= -------------
           20 * bit_time
"

The 2nd criteria (see "df2" column) is influenced by SJW, so of
interest. Here a sja1000 controller:

| Bit timing parameters for sja1000 with 8.000000 MHz ref clock using algo 'v5.19'
|                     _----+--------------=> TSeg1: 1 …   16
|                    /    /     _---------=> TSeg2: 1 …    8
|                   |    |     /    _-----=> SJW:   1 …    4
|                   |    |    |    /    _-=> BRP:   1 …   64 (inc: 1)
|                   |    |    |   |    /
|  nominal          |    |    |   |   |     real  Bitrt    nom   real   SampP
|  Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP  Bitrate  Error  SampP  SampP   Error   df1    df2   BTR0 BTR1
|  1000000    125   2    3    2   1   1  1000000   0.0%  75.0%  75.0%   0.0%   9.8‰   6.2‰   0x00 0x14
|   800000    125   3    4    2   1   1   800000   0.0%  80.0%  80.0%   0.0%   7.8‰   5.0‰   0x00 0x16
|   666666    125   4    4    3   1   1   666666   0.0%  80.0%  75.0%   6.2%   9.8‰   4.2‰   0x00 0x27
|   500000    125   6    7    2   1   1   500000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x00 0x1c
|   250000    250   6    7    2   1   2   250000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x01 0x1c
|   125000    500   6    7    2   1   4   125000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x03 0x1c
|   100000    625   6    7    2   1   5   100000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x04 0x1c
|    83333    750   6    7    2   1   6    83333   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x05 0x1c
|    50000   1250   6    7    2   1  10    50000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x09 0x1c
|    33333   1875   6    7    2   1  15    33333   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x0e 0x1c
|    20000   3125   6    7    2   1  25    20000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x18 0x1c
|    10000   6250   6    7    2   1  50    10000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x31 0x1c

"df2" is 3.1‰ for 500kbit/s or slower.

In contrast the mcp251xfd with the current mainline bit timing algorithm
(SJW=1):

| Bit timing parameters for mcp251xfd with 40.000000 MHz ref clock (CIA recommendation) using algo 'v5.19'
|                     _----+--------------=> TSeg1: 2 …  256
|                    /    /     _---------=> TSeg2: 1 …  128
|                   |    |     /    _-----=> SJW:   1 …  128
|                   |    |    |    /    _-=> BRP:   1 …  256 (inc: 1)
|                   |    |    |   |    /
|  nominal          |    |    |   |   |     real  Bitrt    nom   real   SampP
|  Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP  Bitrate  Error  SampP  SampP   Error   df1    df2       NBTCFG
|  1000000     25  14   15   10   1   1  1000000   0.0%  75.0%  75.0%   0.0%   9.8‰   1.2‰   0x001c0900
|   800000     25  19   20   10   1   1   800000   0.0%  80.0%  80.0%   0.0%   7.8‰   1.0‰   0x00260900
|   666666     25  23   24   12   1   1   666666   0.0%  80.0%  80.0%   0.0%   7.8‰   0.8‰   0x002e0b00
|   500000     25  34   35   10   1   1   500000   0.0%  87.5%  87.5%   0.0%   4.9‰   0.6‰   0x00440900
|   250000     25  69   70   20   1   1   250000   0.0%  87.5%  87.5%   0.0%   4.9‰   0.3‰   0x008a1300
|   125000     50  69   70   20   1   2   125000   0.0%  87.5%  87.5%   0.0%   4.9‰   0.3‰   0x018a1300
|   100000     50  87   87   25   1   2   100000   0.0%  87.5%  87.5%   0.0%   4.9‰   0.2‰   0x01ad1800
|    83333     50 104  105   30   1   2    83333   0.0%  87.5%  87.5%   0.0%   4.9‰   0.2‰   0x01d01d00
|    50000    100  87   87   25   1   4    50000   0.0%  87.5%  87.5%   0.0%   4.9‰   0.2‰   0x03ad1800
|    33333    125 104  105   30   1   5    33333   0.0%  87.5%  87.5%   0.0%   4.9‰   0.2‰   0x04d01d00
|    20000    250  87   87   25   1  10    20000   0.0%  87.5%  87.5%   0.0%   4.9‰   0.2‰   0x09ad1800
|    10000    500  87   87   25   1  20    10000   0.0%  87.5%  87.5%   0.0%   4.9‰   0.2‰   0x13ad1800

for slow bit rates "df2" drops to 0.2‰.

Here the mcp251xfd with the updated algorithm (SJW = Phase Seg2 / 2):

| Bit timing parameters for mcp251xfd with 40.000000 MHz ref clock (CIA recommendation) using algo 'next'
|                     _----+--------------=> TSeg1: 2 …  256
|                    /    /     _---------=> TSeg2: 1 …  128
|                   |    |     /    _-----=> SJW:   1 …  128
|                   |    |    |    /    _-=> BRP:   1 …  256 (inc: 1)
|                   |    |    |   |    /
|  nominal          |    |    |   |   |     real  Bitrt    nom   real   SampP
|  Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP  Bitrate  Error  SampP  SampP   Error   df1    df2       NBTCFG
|  1000000     25  14   15   10   5   1  1000000   0.0%  75.0%  75.0%   0.0%   9.8‰   6.2‰   0x001c0904
|   800000     25  19   20   10   5   1   800000   0.0%  80.0%  80.0%   0.0%   7.8‰   5.0‰   0x00260904
|   666666     25  23   24   12   6   1   666666   0.0%  80.0%  80.0%   0.0%   7.8‰   5.0‰   0x002e0b05
|   500000     25  34   35   10   5   1   500000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x00440904
|   250000     25  69   70   20  10   1   250000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x008a1309
|   125000     50  69   70   20  10   2   125000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x018a1309
|   100000     50  87   87   25  12   2   100000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.0‰   0x01ad180b
|    83333     50 104  105   30  15   2    83333   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x01d01d0e
|    50000    100  87   87   25  12   4    50000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.0‰   0x03ad180b
|    33333    125 104  105   30  15   5    33333   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x04d01d0e
|    20000    250  87   87   25  12  10    20000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.0‰   0x09ad180b
|    10000    500  87   87   25  12  20    10000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.0‰   0x13ad180b

The "df2" stays at ~3.1‰

[2] http://web.archive.org/https://www.can-cia.org/fileadmin/resources/documents/proceedings/2013_mmc_koppe2.pdf
[3] http://web.archive.org/http://www.oertel-halle.de/files/cia99paper.pdf


And I tested with SJW=max from mcp251x (not the mcp251xfd) with SJA=max
to a peak USB adapter with SJW=1 - the peak adapter fails to receive CAN
frames:

| 5: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 10
|     link/can  promiscuity 0 minmtu 0 maxmtu 0
|     can state ERROR-ACTIVE restart-ms 1000
|           bitrate 500000 sample-point 0.850
|           tq 100 prop-seg 8 phase-seg1 8 phase-seg2 3 sjw 3 brp 1
|           mcp251x: tseg1 3..16 tseg2 2..8 sjw 1..4 brp 1..64 brp_inc 1
|           clock 10000000
|           re-started bus-errors arbit-lost error-warn error-pass bus-off
|           0          0          0          2          2          0         numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 parentbus spi parentdev spi1.1

| 97: peakfd0: <NOARP,UP,LOWER_UP,ECHO> mtu 72 qdisc pfifo_fast state UP mode DEFAULT group default qlen 10
|     link/can  promiscuity 0 minmtu 0 maxmtu 0
|     can <FD> state ERROR-PASSIVE (berr-counter tx 0 rx 136) restart-ms 1000
|           bitrate 500000 sample-point 0.875
|           tq 12 prop-seg 69 phase-seg1 70 phase-seg2 20 sjw 1 brp 1
|           pcan_usb_fd: tseg1 1..256 tseg2 1..128 sjw 1..128 brp 1..1024 brp_inc 1
|           dbitrate 2000000 dsample-point 0.750
|           dtq 12 dprop-seg 14 dphase-seg1 15 dphase-seg2 10 dsjw 1 dbrp 1
|           pcan_usb_fd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..1024 dbrp_inc 1
|           clock 80000000
|           re-started bus-errors arbit-lost error-warn error-pass bus-off
|           0          0          0          1          2          0         numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 gro_max_size 65536 parentbus usb parentdev 2-2.2:1.0

|  (1970-01-01 02:41:33.856148)  peakfd0  RX - -  20000004   [8]  00 04 00 00 00 00 00 00   ERRORFRAME
|         controller-problem{rx-error-warning}
|  (1970-01-01 02:41:33.857168)  peakfd0  RX - -  20000004   [8]  00 10 00 00 00 00 00 00   ERRORFRAME
|         controller-problem{rx-error-passive}

With SJW = Phase Seg2 / 2, it works.

> In general the recommendation is to chose SJW as big as possible.
> Here's an excerpt from the CiA 601-3 (Unfortunately I can't share the
> entire document)

CiA 301 and J1939 seem to have a different opinion... 

> "Recom2: Choose sjwA as large as possible.
> - The maximum possible value is sjwA = min(ps1A, ps2A).
> - During arbitration phase, a large sjwA allows a CAN node to resynchronize quickly on the leading transmitting node.
> - If the CAN controller does not allow configuring the maximum possible sjwA
> because the value range for sjwA is limited in the CAN controller, then consider
> increasing BRPA. This allows configuring a larger sjwA, unless this can impact
> phase error tolerance in an unwanted way, see Recom4."
>
> I think we discussed this a while ago already and the biggest
> reasoning against maxing SJW was breaking "compatibility" to old
> kernel versions.

ACK

> The compatibility in this is not being able to sync
> though... Now, if a change to a larger default values is agreed upon
> we might as well chose min(phaseseg1, phaseseg2) directly to maximize
> robustness for the user that doesn't set it explicitly. Given that
> most of the getting started with Socketcan results on your favorite
> Search engine don't even mention SJW that's probably a significant
> portion of the users. From a pure communication standpoint I can't see
> any downside to doing this, the only hypothetical case would be if a
> CAN controller had an erratum about higher SJWs and the driver not
> handling it.
>
> So, in essence, I'd propose to use min(p1,p2).

NACK - as I have a setup that doesn't work anymore when mixing an old
and a new kernel.

regards,
Marc

--
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/5] can: bittiming: can_calc_bittiming(): use Phase Seg2 / 2 as default for SJW
  2022-09-08 19:57     ` Marc Kleine-Budde
@ 2022-09-09 12:29       ` Marc Kleine-Budde
  2022-09-26  6:15         ` Thomas.Kopp
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Kleine-Budde @ 2022-09-09 12:29 UTC (permalink / raw)
  To: Thomas.Kopp; +Cc: linux-can, mark

[-- Attachment #1: Type: text/plain, Size: 1011 bytes --]

On 08.09.2022 21:57:16, Marc Kleine-Budde wrote:
> > Thanks for your work on this! May I ask why Phase Seg2 / 2 is chosen?

[...]

> And I tested with SJW=max from mcp251x (not the mcp251xfd) with SJW=max
> to a peak USB adapter with SJW=1 - the peak adapter fails to receive CAN
> frames:

> With SJW = Phase Seg2 / 2, it works.

Correction:

My test:

    TX (mcp2515, 20 MHz OSC) -> RX (PEAK USB)

Is broken [1] if the PEAK USB uses SJW==1, independent of the SJW of the
mcp2515 (the sender).

The test works if the PEAK USB uses SJW=PhaseSeg2/2.

regards,
Marc

[1] The PEAK USB fails to receive CAN frames with certain contents:
    fails: aaa#aa.00.00.55.00.55.ab.55
    works: aaa#aa.00.00.55.00.55.aa.55

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] can: bittiming: can_calc_bittiming(): make use of min3()
  2022-09-07 10:38 ` [PATCH 2/5] can: bittiming: can_calc_bittiming(): make use of min3() Marc Kleine-Budde
@ 2022-09-11  8:24   ` Vincent Mailhol
  2022-09-12  5:56     ` Thomas.Kopp
  0 siblings, 1 reply; 20+ messages in thread
From: Vincent Mailhol @ 2022-09-11  8:24 UTC (permalink / raw)
  To: Marc Kleine-Budde; +Cc: linux-can, Mark Bath

On Wed. 7 sept. 2022 à 19:59, Marc Kleine-Budde <mkl@pengutronix.de> wrote:
> In can_calc_bittiming() there are several open coded checks to ensure
> that SJW is within certain limits. Replace this by a single call to
> min3().
>
> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
> ---
>  drivers/net/can/dev/calc_bittiming.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/can/dev/calc_bittiming.c b/drivers/net/can/dev/calc_bittiming.c
> index d3caa040614d..ce6bef2444a2 100644
> --- a/drivers/net/can/dev/calc_bittiming.c
> +++ b/drivers/net/can/dev/calc_bittiming.c
> @@ -158,12 +158,8 @@ int can_calc_bittiming(const struct net_device *dev, struct can_bittiming *bt,
>         if (!bt->sjw || !btc->sjw_max) {
>                 bt->sjw = 1;
>         } else {
> -               /* bt->sjw is at least 1 -> sanitize upper bound to sjw_max */
> -               if (bt->sjw > btc->sjw_max)
> -                       bt->sjw = btc->sjw_max;
> -               /* bt->sjw must not be higher than tseg2 */
> -               if (tseg2 < bt->sjw)
> -                       bt->sjw = tseg2;
> +               /* sjw must not be higher than sjw_max and tseg2 */
> +               bt->sjw = min3(bt->sjw, btc->sjw_max, tseg2);

Not directly a criticism of this patch (as things were already like
that), but if the user provides an incorrect value for SJW (or any
other bittiming argument), wouldn't it be better to inform? Returning
-EINVAL might be too violent. Maybe a dmesg would be good?


Yours sincerely,
Vincent Mailhol

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH 2/5] can: bittiming: can_calc_bittiming(): make use of min3()
  2022-09-11  8:24   ` Vincent Mailhol
@ 2022-09-12  5:56     ` Thomas.Kopp
  2022-09-12  8:28       ` Vincent Mailhol
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas.Kopp @ 2022-09-12  5:56 UTC (permalink / raw)
  To: vincent.mailhol, mkl; +Cc: linux-can, mark

> > In can_calc_bittiming() there are several open coded checks to ensure
> > that SJW is within certain limits. Replace this by a single call to
> > min3().
> >
> > Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
> > ---
> >  drivers/net/can/dev/calc_bittiming.c | 8 ++------
> >  1 file changed, 2 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/net/can/dev/calc_bittiming.c
> b/drivers/net/can/dev/calc_bittiming.c
> > index d3caa040614d..ce6bef2444a2 100644
> > --- a/drivers/net/can/dev/calc_bittiming.c
> > +++ b/drivers/net/can/dev/calc_bittiming.c
> > @@ -158,12 +158,8 @@ int can_calc_bittiming(const struct net_device
> *dev, struct can_bittiming *bt,
> >         if (!bt->sjw || !btc->sjw_max) {
> >                 bt->sjw = 1;
> >         } else {
> > -               /* bt->sjw is at least 1 -> sanitize upper bound to sjw_max */
> > -               if (bt->sjw > btc->sjw_max)
> > -                       bt->sjw = btc->sjw_max;
> > -               /* bt->sjw must not be higher than tseg2 */
> > -               if (tseg2 < bt->sjw)
> > -                       bt->sjw = tseg2;
> > +               /* sjw must not be higher than sjw_max and tseg2 */
> > +               bt->sjw = min3(bt->sjw, btc->sjw_max, tseg2);
> 
> Not directly a criticism of this patch (as things were already like
> that), but if the user provides an incorrect value for SJW (or any
> other bittiming argument), wouldn't it be better to inform? Returning
> -EINVAL might be too violent. Maybe a dmesg would be good?

I'd say it would be consistent to keep returning -EINVAL (at least for the SJW>(min p1,p2)) condition. The same is done for FD Bitrate < Arbitration bitrate and both conditions have the same level of justification in the ISO 11898-1:2015
"The data bit time shall have the same length as the nominal bit time or it shall be shorter than the nominal bit time."

"In nominal bit time and in data bit time, SJW shall be less than or equal to the minimum of these two items: Phase_Seg1 and Phase_Seg2."

Shall is a binding requirement to be ISO conformant in this case.

Best Regards,
Thomas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] can: bittiming: can_calc_bittiming(): make use of min3()
  2022-09-12  5:56     ` Thomas.Kopp
@ 2022-09-12  8:28       ` Vincent Mailhol
  2023-01-31 16:04         ` Marc Kleine-Budde
  2023-02-01  9:05         ` Marc Kleine-Budde
  0 siblings, 2 replies; 20+ messages in thread
From: Vincent Mailhol @ 2022-09-12  8:28 UTC (permalink / raw)
  To: Thomas.Kopp; +Cc: mkl, linux-can, mark

On Mon. 12 Sept. 2022 at 14:56, <Thomas.Kopp@microchip.com> wrote:
> > > In can_calc_bittiming() there are several open coded checks to ensure
> > > that SJW is within certain limits. Replace this by a single call to
> > > min3().
> > >
> > > Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
> > > ---
> > >  drivers/net/can/dev/calc_bittiming.c | 8 ++------
> > >  1 file changed, 2 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/net/can/dev/calc_bittiming.c
> > b/drivers/net/can/dev/calc_bittiming.c
> > > index d3caa040614d..ce6bef2444a2 100644
> > > --- a/drivers/net/can/dev/calc_bittiming.c
> > > +++ b/drivers/net/can/dev/calc_bittiming.c
> > > @@ -158,12 +158,8 @@ int can_calc_bittiming(const struct net_device
> > *dev, struct can_bittiming *bt,
> > >         if (!bt->sjw || !btc->sjw_max) {
> > >                 bt->sjw = 1;
> > >         } else {
> > > -               /* bt->sjw is at least 1 -> sanitize upper bound to sjw_max */
> > > -               if (bt->sjw > btc->sjw_max)
> > > -                       bt->sjw = btc->sjw_max;
> > > -               /* bt->sjw must not be higher than tseg2 */
> > > -               if (tseg2 < bt->sjw)
> > > -                       bt->sjw = tseg2;
> > > +               /* sjw must not be higher than sjw_max and tseg2 */
> > > +               bt->sjw = min3(bt->sjw, btc->sjw_max, tseg2);
> >
> > Not directly a criticism of this patch (as things were already like
> > that), but if the user provides an incorrect value for SJW (or any
> > other bittiming argument), wouldn't it be better to inform? Returning
> > -EINVAL might be too violent. Maybe a dmesg would be good?
>
> I'd say it would be consistent to keep returning -EINVAL (at least for the SJW>(min p1,p2)) condition.
>
> The same is done for FD Bitrate < Arbitration bitrate and both conditions have the same level of justification in the ISO 11898-1:2015
> "The data bit time shall have the same length as the nominal bit time or it shall be shorter than the nominal bit time."
>
> "In nominal bit time and in data bit time, SJW shall be less than or equal to the minimum of these two items: Phase_Seg1 and Phase_Seg2."
>
> Shall is a binding requirement to be ISO conformant in this case.

What you say makes sense.

Also, I was assuming that can_fixup_bittiming() was already doing the
out of range check:
https://elixir.bootlin.com/linux/v6.0-rc1/source/drivers/net/can/dev/bittiming.c#L27

But in reality, only one of either can_calc_bittiming(),
can_fixup_bittiming() or can_validate_bitrate() is being called. And
thus, can_validate_bitrate() might be called with out of range values
and in that case the neltink interface should return -ERANGE (for
example if sjw > sjw_max).

Sees that there is more work to do here than initially anticipated.


Yours sincerely,
Vincent Mailhol

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH 5/5] can: bittiming: can_calc_bittiming(): use Phase Seg2 / 2 as default for SJW
  2022-09-09 12:29       ` Marc Kleine-Budde
@ 2022-09-26  6:15         ` Thomas.Kopp
  0 siblings, 0 replies; 20+ messages in thread
From: Thomas.Kopp @ 2022-09-26  6:15 UTC (permalink / raw)
  To: mkl; +Cc: linux-can, mark

> On 08.09.2022 21:57:16, Marc Kleine-Budde wrote:
> > > Thanks for your work on this! May I ask why Phase Seg2 / 2 is chosen?
> 
> [...]
> 
> > And I tested with SJW=max from mcp251x (not the mcp251xfd) with
> SJW=max
> > to a peak USB adapter with SJW=1 - the peak adapter fails to receive CAN
> > frames:
> 
> > With SJW = Phase Seg2 / 2, it works.
> 
> Correction:
> 
> My test:
> 
>     TX (mcp2515, 20 MHz OSC) -> RX (PEAK USB)
> 
> Is broken [1] if the PEAK USB uses SJW==1, independent of the SJW of the
> mcp2515 (the sender).
> 
> The test works if the PEAK USB uses SJW=PhaseSeg2/2.

So everything works as expected (or doesn't work, also as expected,) that's good. So, how do we move forward here? I'd still say maximizing SJW is the better thing to do but we should get definitely get it increased and if PS/2 is the compromise I can live with that(though, there are no drawbacks to PS/2 besides "compatibility").

Best Regards,
Thomas 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] can: bittiming: can_calc_bittiming(): make use of min3()
  2022-09-12  8:28       ` Vincent Mailhol
@ 2023-01-31 16:04         ` Marc Kleine-Budde
  2023-02-01  5:49           ` Vincent Mailhol
  2023-02-01  9:05         ` Marc Kleine-Budde
  1 sibling, 1 reply; 20+ messages in thread
From: Marc Kleine-Budde @ 2023-01-31 16:04 UTC (permalink / raw)
  To: Vincent Mailhol; +Cc: Thomas.Kopp, linux-can, mark

[-- Attachment #1: Type: text/plain, Size: 1646 bytes --]

Finally I've found some time to look at this again...

On 12.09.2022 17:28:15, Vincent Mailhol wrote:
> Also, I was assuming that can_fixup_bittiming() was already doing the
> out of range check:
> https://elixir.bootlin.com/linux/v6.0-rc1/source/drivers/net/can/dev/bittiming.c#L27
>
> But in reality, only one of either can_calc_bittiming(),
> can_fixup_bittiming() or can_validate_bitrate() is being called. And
> thus, can_validate_bitrate() might be called with out of range values
> and in that case the neltink interface should return -ERANGE (for
> example if sjw > sjw_max).
>
> Sees that there is more work to do here than initially anticipated.

I've converted the existing netdev_err() NL_SET_ERR_MSG_FMT(). This
means the error message is transported via netlink to user space and
printed by the "ip" tool.

| # ip link set flexcan0 txqueuelen 10 type can bitrate 2200000
| Warning: bitrate error 2.5%.
| 
| # ip link set flexcan0 txqueuelen 10 type can bitrate 22000000
| Error: bitrate error 80.5% too high.

This is the error message for the SJW check:

| # ip link set flexcan0 txqueuelen 10 type can bitrate 500000 sjw 3
| Error: SJW 3 bigger than phase_seg1 6 and/or phase_seg2 2.

Maybe add a '=' between the phase_seg and the actual number:

| Error: SJW 3 bigger than phase_seg1=6 and/or phase_seg2=2.

regards,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] can: bittiming: can_calc_bittiming(): make use of min3()
  2023-01-31 16:04         ` Marc Kleine-Budde
@ 2023-02-01  5:49           ` Vincent Mailhol
  2023-02-01  8:58             ` Marc Kleine-Budde
  0 siblings, 1 reply; 20+ messages in thread
From: Vincent Mailhol @ 2023-02-01  5:49 UTC (permalink / raw)
  To: Marc Kleine-Budde; +Cc: Thomas.Kopp, linux-can, mark

On Wed. 1 Feb 2023 at 01:04, Marc Kleine-Budde <mkl@pengutronix.de> wrote:
> Finally I've found some time to look at this again...
>
> On 12.09.2022 17:28:15, Vincent Mailhol wrote:
> > Also, I was assuming that can_fixup_bittiming() was already doing the
> > out of range check:
> > https://elixir.bootlin.com/linux/v6.0-rc1/source/drivers/net/can/dev/bittiming.c#L27
> >
> > But in reality, only one of either can_calc_bittiming(),
> > can_fixup_bittiming() or can_validate_bitrate() is being called. And
> > thus, can_validate_bitrate() might be called with out of range values
> > and in that case the neltink interface should return -ERANGE (for
> > example if sjw > sjw_max).
> >
> > Sees that there is more work to do here than initially anticipated.
>
> I've converted the existing netdev_err() NL_SET_ERR_MSG_FMT(). This
> means the error message is transported via netlink to user space and
> printed by the "ip" tool.

I was not aware of this NL_SET_ERR_MSG_FMT() thing, and let me say
that I really like that solution!

> | # ip link set flexcan0 txqueuelen 10 type can bitrate 2200000
> | Warning: bitrate error 2.5%.

The Linux coding style says:

  Kernel messages do not have to be terminated with a period.

Ref: https://www.kernel.org/doc/html/latest/process/coding-style.html#printing-kernel-messages

However, I am not sure if this also applies to the ip tool.

> |
> | # ip link set flexcan0 txqueuelen 10 type can bitrate 22000000
> | Error: bitrate error 80.5% too high.
>
> This is the error message for the SJW check:
>
> | # ip link set flexcan0 txqueuelen 10 type can bitrate 500000 sjw 3
> | Error: SJW 3 bigger than phase_seg1 6 and/or phase_seg2 2.

At that point in the code, I assume that sjw was already validated
against sjw_max. While not impossible, I think that having sjw bigger
than both phase_seg1 and phase_seg2 at the same time is uncommon. I
suggest to split the error message in two and only print the relevant
one:

  Error: SJW 3 bigger than phase_seg1 x
  Error: SJW 3 bigger than phase_seg2 x

If the user still manages to violate both, only the phase_seg1 error
message is displayed.

> Maybe add a '=' between the phase_seg and the actual number:
>
> | Error: SJW 3 bigger than phase_seg1=6 and/or phase_seg2=2.

If you ask me my preferences, I will go with column:

  Error: SJW: 3 bigger than phase_seg1: 6 and/or phase_seg2: 2.

But I will not complain if you pick anything else.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] can: bittiming: can_calc_bittiming(): make use of min3()
  2023-02-01  5:49           ` Vincent Mailhol
@ 2023-02-01  8:58             ` Marc Kleine-Budde
  2023-02-01  9:38               ` Vincent Mailhol
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Kleine-Budde @ 2023-02-01  8:58 UTC (permalink / raw)
  To: Vincent Mailhol; +Cc: Thomas.Kopp, linux-can, mark

[-- Attachment #1: Type: text/plain, Size: 2313 bytes --]

On 01.02.2023 14:49:23, Vincent Mailhol wrote:
> > I've converted the existing netdev_err() NL_SET_ERR_MSG_FMT(). This
> > means the error message is transported via netlink to user space and
> > printed by the "ip" tool.
> 
> I was not aware of this NL_SET_ERR_MSG_FMT() thing, and let me say
> that I really like that solution!
> 
> > | # ip link set flexcan0 txqueuelen 10 type can bitrate 2200000
> > | Warning: bitrate error 2.5%.
> 
> The Linux coding style says:
> 
>   Kernel messages do not have to be terminated with a period.
> 
> Ref: https://www.kernel.org/doc/html/latest/process/coding-style.html#printing-kernel-messages
> 
> However, I am not sure if this also applies to the ip tool.

I think the period is added by "ip".

> > |
> > | # ip link set flexcan0 txqueuelen 10 type can bitrate 22000000
> > | Error: bitrate error 80.5% too high.
> >
> > This is the error message for the SJW check:
> >
> > | # ip link set flexcan0 txqueuelen 10 type can bitrate 500000 sjw 3
> > | Error: SJW 3 bigger than phase_seg1 6 and/or phase_seg2 2.
> 
> At that point in the code, I assume that sjw was already validated
> against sjw_max. While not impossible, I think that having sjw bigger
> than both phase_seg1 and phase_seg2 at the same time is uncommon. I
> suggest to split the error message in two and only print the relevant
> one:
> 
>   Error: SJW 3 bigger than phase_seg1 x
>   Error: SJW 3 bigger than phase_seg2 x
> 
> If the user still manages to violate both, only the phase_seg1 error
> message is displayed.

Makes sense.

> > Maybe add a '=' between the phase_seg and the actual number:
> >
> > | Error: SJW 3 bigger than phase_seg1=6 and/or phase_seg2=2.
> 
> If you ask me my preferences, I will go with column:
> 
>   Error: SJW: 3 bigger than phase_seg1: 6 and/or phase_seg2: 2.
> 
> But I will not complain if you pick anything else.

Period looks good.

What about the error value? Always return -EINVAL instead of a mix of
-EINVAL and -ERANGE?

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] can: bittiming: can_calc_bittiming(): make use of min3()
  2022-09-12  8:28       ` Vincent Mailhol
  2023-01-31 16:04         ` Marc Kleine-Budde
@ 2023-02-01  9:05         ` Marc Kleine-Budde
  1 sibling, 0 replies; 20+ messages in thread
From: Marc Kleine-Budde @ 2023-02-01  9:05 UTC (permalink / raw)
  To: Vincent Mailhol; +Cc: Thomas.Kopp, linux-can, mark

[-- Attachment #1: Type: text/plain, Size: 2292 bytes --]

On 12.09.2022 17:28:15, Vincent Mailhol wrote:
> > > Not directly a criticism of this patch (as things were already like
> > > that), but if the user provides an incorrect value for SJW (or any
> > > other bittiming argument), wouldn't it be better to inform? Returning
> > > -EINVAL might be too violent. Maybe a dmesg would be good?
> >
> > I'd say it would be consistent to keep returning -EINVAL (at least for the SJW>(min p1,p2)) condition.
> >
> > The same is done for FD Bitrate < Arbitration bitrate and both conditions have the same level of justification in the ISO 11898-1:2015
> > "The data bit time shall have the same length as the nominal bit time or it shall be shorter than the nominal bit time."
> >
> > "In nominal bit time and in data bit time, SJW shall be less than or equal to the minimum of these two items: Phase_Seg1 and Phase_Seg2."
> >
> > Shall is a binding requirement to be ISO conformant in this case.
> 
> What you say makes sense.
> 
> Also, I was assuming that can_fixup_bittiming() was already doing the
> out of range check:
> https://elixir.bootlin.com/linux/v6.0-rc1/source/drivers/net/can/dev/bittiming.c#L27
> 
> But in reality, only one of either can_calc_bittiming(),
> can_fixup_bittiming() or can_validate_bitrate() is being called. And
> thus, can_validate_bitrate() might be called with out of range values
> and in that case the neltink interface should return -ERANGE (for
> example if sjw > sjw_max).
> 
> Sees that there is more work to do here than initially anticipated.

If you configure the bitrate + sjw there will be a check for sjw within
the limits (< sjw_max, < min(phase_seg1, phase_seg2)).

If you configure the low level timing parameters there will be limit
checks on the user supplied phase_seg1, phase_seg2 and sjw and a limit
check on the calculated brp.

What should be the return value if these checks fail?

I think it's best to always return -EINVAL and set an error message with
NL_SET_ERR_MSG_FMT().

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] can: bittiming: can_calc_bittiming(): make use of min3()
  2023-02-01  8:58             ` Marc Kleine-Budde
@ 2023-02-01  9:38               ` Vincent Mailhol
  2023-02-02  9:55                 ` Marc Kleine-Budde
  0 siblings, 1 reply; 20+ messages in thread
From: Vincent Mailhol @ 2023-02-01  9:38 UTC (permalink / raw)
  To: Marc Kleine-Budde; +Cc: Thomas.Kopp, linux-can, mark

On Wed. 1 Feb 2023 at 17:58, Marc Kleine-Budde <mkl@pengutronix.de> wrote:
> On 01.02.2023 14:49:23, Vincent Mailhol wrote:
> > > I've converted the existing netdev_err() NL_SET_ERR_MSG_FMT(). This
> > > means the error message is transported via netlink to user space and
> > > printed by the "ip" tool.
> >
> > I was not aware of this NL_SET_ERR_MSG_FMT() thing, and let me say
> > that I really like that solution!
> >
> > > | # ip link set flexcan0 txqueuelen 10 type can bitrate 2200000
> > > | Warning: bitrate error 2.5%.
> >
> > The Linux coding style says:
> >
> >   Kernel messages do not have to be terminated with a period.
> >
> > Ref: https://www.kernel.org/doc/html/latest/process/coding-style.html#printing-kernel-messages
> >
> > However, I am not sure if this also applies to the ip tool.
>
> I think the period is added by "ip".

Ack. All good then!

> > > |
> > > | # ip link set flexcan0 txqueuelen 10 type can bitrate 22000000
> > > | Error: bitrate error 80.5% too high.
> > >
> > > This is the error message for the SJW check:
> > >
> > > | # ip link set flexcan0 txqueuelen 10 type can bitrate 500000 sjw 3
> > > | Error: SJW 3 bigger than phase_seg1 6 and/or phase_seg2 2.
> >
> > At that point in the code, I assume that sjw was already validated
> > against sjw_max. While not impossible, I think that having sjw bigger
> > than both phase_seg1 and phase_seg2 at the same time is uncommon. I
> > suggest to split the error message in two and only print the relevant
> > one:
> >
> >   Error: SJW 3 bigger than phase_seg1 x
> >   Error: SJW 3 bigger than phase_seg2 x
> >
> > If the user still manages to violate both, only the phase_seg1 error
> > message is displayed.
>
> Makes sense.
>
> > > Maybe add a '=' between the phase_seg and the actual number:
> > >
> > > | Error: SJW 3 bigger than phase_seg1=6 and/or phase_seg2=2.
> >
> > If you ask me my preferences, I will go with column:
> >
> >   Error: SJW: 3 bigger than phase_seg1: 6 and/or phase_seg2: 2.
> >
> > But I will not complain if you pick anything else.
>
> Period looks good.
>
> What about the error value? Always return -EINVAL instead of a mix of
> -EINVAL and -ERANGE?

Looking at the comments from uapi/asm-generic/errno-base.h:

  #define EINVAL 22 /* Invalid argument */
  #define ERANGE 34 /* Math result not representable */

We are not dealing with some non-representable math results, so
-ERANGE is technically incorrect. I did suggest -ERANGE in the past
because it looks natural to me: we define a min and a max, i.e. its a
range. But the doc tells me I am wrong. Naming the error -ENAN (Error
Not A Number) would have been more explicit, but we are not going to
rewrite UAPI error definitions.

Go with -EINVAL This answer also applies to your next message and to
everywhere else in netlink, I guess.


Yours sincerely,
Vincent Mailhol

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] can: bittiming: can_calc_bittiming(): make use of min3()
  2023-02-01  9:38               ` Vincent Mailhol
@ 2023-02-02  9:55                 ` Marc Kleine-Budde
  2023-02-02 10:29                   ` Vincent Mailhol
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Kleine-Budde @ 2023-02-02  9:55 UTC (permalink / raw)
  To: Vincent Mailhol; +Cc: Thomas.Kopp, linux-can, mark

[-- Attachment #1: Type: text/plain, Size: 2505 bytes --]

On 01.02.2023 18:38:17, Vincent Mailhol wrote:
[...]
> > > > Maybe add a '=' between the phase_seg and the actual number:
> > > >
> > > > | Error: SJW 3 bigger than phase_seg1=6 and/or phase_seg2=2.
> > >
> > > If you ask me my preferences, I will go with column:
> > >
> > >   Error: SJW: 3 bigger than phase_seg1: 6 and/or phase_seg2: 2.
> > >
> > > But I will not complain if you pick anything else.
> >
> > Period looks good.

Now looks like this:

| sudo ip link set flexcan0 txqueuelen 10 type can bitrate 22000000
| Error: bitrate error: 80.5% too high.
| 
| sudo ip link set flexcan0 txqueuelen 10 type can bitrate 2200000 
| Warning: bitrate error: 2.5%.
| 
| sudo ip link set flexcan0 txqueuelen 10 type can bitrate 1000000 sjw 10
| Error: sjw: 10 greater than max sjw: 4.
| 
| sudo ip link set flexcan0 txqueuelen 10 type can prop-seg 3 phase-seg1 1 phase-seg2 8 tq 10  
| Error: resulting brp: 0 less than brp-min: 1.
| 
| sudo ip link set flexcan0 txqueuelen 10 type can prop-seg 3 phase-seg1 4 phase-seg2 4 tq 30 sjw 5
| Error: sjw: 5 greater than max sjw: 4.
| 
| sudo ip link set flexcan0 txqueuelen 10 type can prop-seg 4 phase-seg1 3 phase-seg2 4 tq 30 sjw 4
| Error: sjw: 4 greater than phase-seg1: 3.
| 
| sudo ip link set flexcan0 txqueuelen 10 type can prop-seg 4 phase-seg1 4 phase-seg2 3 tq 30 sjw 4
| Error: sjw: 4 greater than phase-seg2: 3.

> > What about the error value? Always return -EINVAL instead of a mix of
> > -EINVAL and -ERANGE?
> 
> Looking at the comments from uapi/asm-generic/errno-base.h:
> 
>   #define EINVAL 22 /* Invalid argument */
>   #define ERANGE 34 /* Math result not representable */
> 
> We are not dealing with some non-representable math results, so
> -ERANGE is technically incorrect. I did suggest -ERANGE in the past
> because it looks natural to me: we define a min and a max, i.e. its a
> range. But the doc tells me I am wrong. Naming the error -ENAN (Error
> Not A Number) would have been more explicit, but we are not going to
> rewrite UAPI error definitions.
> 
> Go with -EINVAL This answer also applies to your next message and to
> everywhere else in netlink, I guess.

It's -EINVAL everywhere now.

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/5] can: bittiming: can_calc_bittiming(): make use of min3()
  2023-02-02  9:55                 ` Marc Kleine-Budde
@ 2023-02-02 10:29                   ` Vincent Mailhol
  0 siblings, 0 replies; 20+ messages in thread
From: Vincent Mailhol @ 2023-02-02 10:29 UTC (permalink / raw)
  To: Marc Kleine-Budde; +Cc: Thomas.Kopp, linux-can, mark

On Thu. 2 Feb 2023 at 18:55, Marc Kleine-Budde <mkl@pengutronix.de> wrote:
> On 01.02.2023 18:38:17, Vincent Mailhol wrote:
> [...]
> > > > > Maybe add a '=' between the phase_seg and the actual number:
> > > > >
> > > > > | Error: SJW 3 bigger than phase_seg1=6 and/or phase_seg2=2.
> > > >
> > > > If you ask me my preferences, I will go with column:
> > > >
> > > >   Error: SJW: 3 bigger than phase_seg1: 6 and/or phase_seg2: 2.
> > > >
> > > > But I will not complain if you pick anything else.
> > >
> > > Period looks good.
>
> Now looks like this:
>
> | sudo ip link set flexcan0 txqueuelen 10 type can bitrate 22000000
> | Error: bitrate error: 80.5% too high.
> |
> | sudo ip link set flexcan0 txqueuelen 10 type can bitrate 2200000
> | Warning: bitrate error: 2.5%.
> |
> | sudo ip link set flexcan0 txqueuelen 10 type can bitrate 1000000 sjw 10
> | Error: sjw: 10 greater than max sjw: 4.
> |
> | sudo ip link set flexcan0 txqueuelen 10 type can prop-seg 3 phase-seg1 1 phase-seg2 8 tq 10
> | Error: resulting brp: 0 less than brp-min: 1.
> |
> | sudo ip link set flexcan0 txqueuelen 10 type can prop-seg 3 phase-seg1 4 phase-seg2 4 tq 30 sjw 5
> | Error: sjw: 5 greater than max sjw: 4.
> |
> | sudo ip link set flexcan0 txqueuelen 10 type can prop-seg 4 phase-seg1 3 phase-seg2 4 tq 30 sjw 4
> | Error: sjw: 4 greater than phase-seg1: 3.
> |
> | sudo ip link set flexcan0 txqueuelen 10 type can prop-seg 4 phase-seg1 4 phase-seg2 3 tq 30 sjw 4
> | Error: sjw: 4 greater than phase-seg2: 3.

I like it. I am looking forward to seeing the patches.

> > > What about the error value? Always return -EINVAL instead of a mix of
> > > -EINVAL and -ERANGE?
> >
> > Looking at the comments from uapi/asm-generic/errno-base.h:
> >
> >   #define EINVAL 22 /* Invalid argument */
> >   #define ERANGE 34 /* Math result not representable */
> >
> > We are not dealing with some non-representable math results, so
> > -ERANGE is technically incorrect. I did suggest -ERANGE in the past
> > because it looks natural to me: we define a min and a max, i.e. its a
> > range. But the doc tells me I am wrong. Naming the error -ENAN (Error
> > Not A Number) would have been more explicit, but we are not going to
> > rewrite UAPI error definitions.
> >
> > Go with -EINVAL This answer also applies to your next message and to
> > everywhere else in netlink, I guess.
>
> It's -EINVAL everywhere now.

\o/

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-02-02 10:29 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-07 10:38 [PATCH 0/5] can: bittiming: improve SJW handling and default Marc Kleine-Budde
2022-09-07 10:38 ` [PATCH 1/5] can: dev: register_candev(): ensure that bittiming const are valid Marc Kleine-Budde
2022-09-07 10:38 ` [PATCH 2/5] can: bittiming: can_calc_bittiming(): make use of min3() Marc Kleine-Budde
2022-09-11  8:24   ` Vincent Mailhol
2022-09-12  5:56     ` Thomas.Kopp
2022-09-12  8:28       ` Vincent Mailhol
2023-01-31 16:04         ` Marc Kleine-Budde
2023-02-01  5:49           ` Vincent Mailhol
2023-02-01  8:58             ` Marc Kleine-Budde
2023-02-01  9:38               ` Vincent Mailhol
2023-02-02  9:55                 ` Marc Kleine-Budde
2023-02-02 10:29                   ` Vincent Mailhol
2023-02-01  9:05         ` Marc Kleine-Budde
2022-09-07 10:38 ` [PATCH 3/5] can: bittiming: can_calc_bittiming(): clean up SJW sanitizing Marc Kleine-Budde
2022-09-07 10:38 ` [PATCH 4/5] can: bittiming: can_calc_bittiming(): ensure that SJW is not longer than either Phase Buffer Segment Marc Kleine-Budde
2022-09-07 10:38 ` [PATCH 5/5] can: bittiming: can_calc_bittiming(): use Phase Seg2 / 2 as default for SJW Marc Kleine-Budde
2022-09-07 19:19   ` Thomas.Kopp
2022-09-08 19:57     ` Marc Kleine-Budde
2022-09-09 12:29       ` Marc Kleine-Budde
2022-09-26  6:15         ` Thomas.Kopp

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.