Linux-Can Archive on lore.kernel.org
 help / color / Atom feed
* [RFC]: can 2020-10-19
@ 2020-10-19 19:05 Marc Kleine-Budde
  2020-10-19 19:05 ` [net-rfc 01/16] can: proc: can_remove_proc(): silence remove_proc_entry warning Marc Kleine-Budde
                   ` (15 more replies)
  0 siblings, 16 replies; 45+ messages in thread
From: Marc Kleine-Budde @ 2020-10-19 19:05 UTC (permalink / raw)
  To: linux-can; +Cc: kernel

Hello,

this is a RFC series of all patches, I have collected for my next pull request
to net/master.

regards,
Marc



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [net-rfc 01/16] can: proc: can_remove_proc(): silence remove_proc_entry warning
  2020-10-19 19:05 [RFC]: can 2020-10-19 Marc Kleine-Budde
@ 2020-10-19 19:05 ` Marc Kleine-Budde
  2020-10-19 19:05 ` [net-rfc 02/16] can: rx-offload: don't call kfree_skb() from IRQ context Marc Kleine-Budde
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 45+ messages in thread
From: Marc Kleine-Budde @ 2020-10-19 19:05 UTC (permalink / raw)
  To: linux-can; +Cc: kernel, Zhang Changzhong, Oliver Hartkopp, Marc Kleine-Budde

From: Zhang Changzhong <zhangchangzhong@huawei.com>

If can_init_proc() fail to create /proc/net/can directory, can_remove_proc()
will trigger a warning:

WARNING: CPU: 6 PID: 7133 at fs/proc/generic.c:672 remove_proc_entry+0x17b0
Kernel panic - not syncing: panic_on_warn set ...

Fix to return early from can_remove_proc() if can proc_dir does not exists.

Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Link: https://lore.kernel.org/r/1594709090-3203-1-git-send-email-zhangchangzhong@huawei.com
Fixes: 8e8cda6d737d ("can: initial support for network namespaces")
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 net/can/proc.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/can/proc.c b/net/can/proc.c
index 550928b8b8a2..5ea8695f507e 100644
--- a/net/can/proc.c
+++ b/net/can/proc.c
@@ -462,6 +462,9 @@ void can_init_proc(struct net *net)
  */
 void can_remove_proc(struct net *net)
 {
+	if (!net->can.proc_dir)
+		return;
+
 	if (net->can.pde_stats)
 		remove_proc_entry(CAN_PROC_STATS, net->can.proc_dir);
 
@@ -486,6 +489,5 @@ void can_remove_proc(struct net *net)
 	if (net->can.pde_rcvlist_sff)
 		remove_proc_entry(CAN_PROC_RCVLIST_SFF, net->can.proc_dir);
 
-	if (net->can.proc_dir)
-		remove_proc_entry("can", net->proc_net);
+	remove_proc_entry("can", net->proc_net);
 }
-- 
2.28.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [net-rfc 02/16] can: rx-offload: don't call kfree_skb() from IRQ context
  2020-10-19 19:05 [RFC]: can 2020-10-19 Marc Kleine-Budde
  2020-10-19 19:05 ` [net-rfc 01/16] can: proc: can_remove_proc(): silence remove_proc_entry warning Marc Kleine-Budde
@ 2020-10-19 19:05 ` Marc Kleine-Budde
  2020-10-19 19:05 ` [net-rfc 03/16] can: dev: can_get_echo_skb(): prevent call to kfree_skb() in hard " Marc Kleine-Budde
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 45+ messages in thread
From: Marc Kleine-Budde @ 2020-10-19 19:05 UTC (permalink / raw)
  To: linux-can; +Cc: kernel, Marc Kleine-Budde

A CAN driver, using the rx-offload infrastructure, is reading CAN frames
(usually in IRQ context) from the hardware and placing it into the rx-offload
queue to be delivered to the networking stack via NAPI.

In case the rx-offload queue is full, trying to add more skbs results in the
skbs being dropped using kfree_skb(). If done from hard-IRQ context this
results in the following warning:

[  682.552693] ------------[ cut here ]------------
[  682.557360] WARNING: CPU: 0 PID: 3057 at net/core/skbuff.c:650 skb_release_head_state+0x74/0x84
[  682.566075] Modules linked in: can_raw can coda_vpu flexcan dw_hdmi_ahb_audio v4l2_jpeg imx_vdoa can_dev
[  682.575597] CPU: 0 PID: 3057 Comm: cansend Tainted: G        W         5.7.0+ #18
[  682.583098] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[  682.589657] [<c0112628>] (unwind_backtrace) from [<c010c1c4>] (show_stack+0x10/0x14)
[  682.597423] [<c010c1c4>] (show_stack) from [<c06c481c>] (dump_stack+0xe0/0x114)
[  682.604759] [<c06c481c>] (dump_stack) from [<c0128f10>] (__warn+0xc0/0x10c)
[  682.611742] [<c0128f10>] (__warn) from [<c0129314>] (warn_slowpath_fmt+0x5c/0xc0)
[  682.619248] [<c0129314>] (warn_slowpath_fmt) from [<c0b95dec>] (skb_release_head_state+0x74/0x84)
[  682.628143] [<c0b95dec>] (skb_release_head_state) from [<c0b95e08>] (skb_release_all+0xc/0x24)
[  682.636774] [<c0b95e08>] (skb_release_all) from [<c0b95eac>] (kfree_skb+0x74/0x1c8)
[  682.644479] [<c0b95eac>] (kfree_skb) from [<bf001d1c>] (can_rx_offload_queue_sorted+0xe0/0xe8 [can_dev])
[  682.654051] [<bf001d1c>] (can_rx_offload_queue_sorted [can_dev]) from [<bf001d6c>] (can_rx_offload_get_echo_skb+0x48/0x94 [can_dev])
[  682.666007] [<bf001d6c>] (can_rx_offload_get_echo_skb [can_dev]) from [<bf01efe4>] (flexcan_irq+0x194/0x5dc [flexcan])
[  682.676734] [<bf01efe4>] (flexcan_irq [flexcan]) from [<c019c1ec>] (__handle_irq_event_percpu+0x4c/0x3ec)
[  682.686322] [<c019c1ec>] (__handle_irq_event_percpu) from [<c019c5b8>] (handle_irq_event_percpu+0x2c/0x88)
[  682.695993] [<c019c5b8>] (handle_irq_event_percpu) from [<c019c64c>] (handle_irq_event+0x38/0x5c)
[  682.704887] [<c019c64c>] (handle_irq_event) from [<c01a1058>] (handle_fasteoi_irq+0xc8/0x180)
[  682.713432] [<c01a1058>] (handle_fasteoi_irq) from [<c019b2c0>] (generic_handle_irq+0x30/0x44)
[  682.722063] [<c019b2c0>] (generic_handle_irq) from [<c019b8f8>] (__handle_domain_irq+0x64/0xdc)
[  682.730783] [<c019b8f8>] (__handle_domain_irq) from [<c06df4a4>] (gic_handle_irq+0x48/0x9c)
[  682.739158] [<c06df4a4>] (gic_handle_irq) from [<c0100b30>] (__irq_svc+0x70/0x98)
[  682.746656] Exception stack(0xe80e9dd8 to 0xe80e9e20)
[  682.751725] 9dc0:                                                       00000001 e80e8000
[  682.759922] 9de0: e820cf80 00000000 ffffe000 00000000 eaf08fe4 00000000 600d0013 00000000
[  682.768117] 9e00: c1732e3c c16093a8 e820d4c0 e80e9e28 c018a57c c018b870 600d0013 ffffffff
[  682.776315] [<c0100b30>] (__irq_svc) from [<c018b870>] (lock_acquire+0x108/0x4e8)
[  682.783821] [<c018b870>] (lock_acquire) from [<c0e938e4>] (down_write+0x48/0xa8)
[  682.791242] [<c0e938e4>] (down_write) from [<c02818dc>] (unlink_file_vma+0x24/0x40)
[  682.798922] [<c02818dc>] (unlink_file_vma) from [<c027a258>] (free_pgtables+0x34/0xb8)
[  682.806858] [<c027a258>] (free_pgtables) from [<c02835a4>] (exit_mmap+0xe4/0x170)
[  682.814361] [<c02835a4>] (exit_mmap) from [<c01248e0>] (mmput+0x5c/0x110)
[  682.821171] [<c01248e0>] (mmput) from [<c012e910>] (do_exit+0x374/0xbe4)
[  682.827892] [<c012e910>] (do_exit) from [<c0130888>] (do_group_exit+0x38/0xb4)
[  682.835132] [<c0130888>] (do_group_exit) from [<c0130914>] (__wake_up_parent+0x0/0x14)
[  682.843063] irq event stamp: 1936
[  682.846399] hardirqs last  enabled at (1935): [<c02938b0>] rmqueue+0xf4/0xc64
[  682.853553] hardirqs last disabled at (1936): [<c0100b20>] __irq_svc+0x60/0x98
[  682.860799] softirqs last  enabled at (1878): [<bf04cdcc>] raw_release+0x108/0x1f0 [can_raw]
[  682.869256] softirqs last disabled at (1876): [<c0b8f478>] release_sock+0x18/0x98
[  682.876753] ---[ end trace 7bca4751ce44c444 ]---

This patch fixes the problem by replacing the kfree_skb() by
dev_kfree_skb_any(), as rx-offload might be called from threaded IRQ handlers
as well.

Fixes: ca913f1ac024 ("can: rx-offload: can_rx_offload_queue_sorted(): fix error handling, avoid skb mem leak")
Fixes: 6caf8a6d6586 ("can: rx-offload: can_rx_offload_queue_tail(): fix error handling, avoid skb mem leak")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 drivers/net/can/rx-offload.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/can/rx-offload.c b/drivers/net/can/rx-offload.c
index 3b180269a92d..6e95193b215b 100644
--- a/drivers/net/can/rx-offload.c
+++ b/drivers/net/can/rx-offload.c
@@ -245,7 +245,7 @@ int can_rx_offload_queue_sorted(struct can_rx_offload *offload,
 
 	if (skb_queue_len(&offload->skb_queue) >
 	    offload->skb_queue_len_max) {
-		kfree_skb(skb);
+		dev_kfree_skb_any(skb);
 		return -ENOBUFS;
 	}
 
@@ -290,7 +290,7 @@ int can_rx_offload_queue_tail(struct can_rx_offload *offload,
 {
 	if (skb_queue_len(&offload->skb_queue) >
 	    offload->skb_queue_len_max) {
-		kfree_skb(skb);
+		dev_kfree_skb_any(skb);
 		return -ENOBUFS;
 	}
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [net-rfc 03/16] can: dev: can_get_echo_skb(): prevent call to kfree_skb() in hard IRQ context
  2020-10-19 19:05 [RFC]: can 2020-10-19 Marc Kleine-Budde
  2020-10-19 19:05 ` [net-rfc 01/16] can: proc: can_remove_proc(): silence remove_proc_entry warning Marc Kleine-Budde
  2020-10-19 19:05 ` [net-rfc 02/16] can: rx-offload: don't call kfree_skb() from IRQ context Marc Kleine-Budde
@ 2020-10-19 19:05 ` Marc Kleine-Budde
  2020-10-19 19:05 ` [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames Marc Kleine-Budde
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 45+ messages in thread
From: Marc Kleine-Budde @ 2020-10-19 19:05 UTC (permalink / raw)
  To: linux-can; +Cc: kernel, Vincent Mailhol, Marc Kleine-Budde

From: Vincent Mailhol <mailhol.vincent@wanadoo.fr>

If a driver calls can_get_echo_skb() during a hardware IRQ (which is often, but
not always, the case), the 'WARN_ON(in_irq)' in
net/core/skbuff.c#skb_release_head_state() might be triggered, under network
congestion circumstances, together with the potential risk of a NULL pointer
dereference.

The root cause of this issue is the call to kfree_skb() instead of
dev_kfree_skb_irq() in net/core/dev.c#enqueue_to_backlog().

This patch prevents the skb to be freed within the call to netif_rx() by
incrementing its reference count with skb_get(). The skb is finally freed by
one of the in-irq-context safe functions: dev_consume_skb_any() or
dev_kfree_skb_any(). The "any" version is used because some drivers might call
can_get_echo_skb() in a normal context.

The reason for this issue to occur is that initially, in the core network
stack, loopback skb were not supposed to be received in hardware IRQ context.
The CAN stack is an exeption.

This bug was previously reported back in 2017 in [1] but the proposed patch
never got accepted.

While [1] directly modifies net/core/dev.c, we try to propose here a
smoother modification local to CAN network stack (the assumption
behind is that only CAN devices are affected by this issue).

[1] http://lore.kernel.org/r/57a3ffb6-3309-3ad5-5a34-e93c3fe3614d@cetitec.com

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20201002154219.4887-2-mailhol.vincent@wanadoo.fr
Fixes: 39549eef3587 ("can: CAN Network device driver and Netlink interface")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 drivers/net/can/dev.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
index b70ded3760f2..73cfcd7e9517 100644
--- a/drivers/net/can/dev.c
+++ b/drivers/net/can/dev.c
@@ -538,7 +538,11 @@ unsigned int can_get_echo_skb(struct net_device *dev, unsigned int idx)
 	if (!skb)
 		return 0;
 
-	netif_rx(skb);
+	skb_get(skb);
+	if (netif_rx(skb) == NET_RX_SUCCESS)
+		dev_consume_skb_any(skb);
+	else
+		dev_kfree_skb_any(skb);
 
 	return len;
 }
-- 
2.28.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [net-rfc 04/16] can: dev: can_get_len():  add a helper function to get the correct length of Classical frames
  2020-10-19 19:05 [RFC]: can 2020-10-19 Marc Kleine-Budde
                   ` (2 preceding siblings ...)
  2020-10-19 19:05 ` [net-rfc 03/16] can: dev: can_get_echo_skb(): prevent call to kfree_skb() in hard " Marc Kleine-Budde
@ 2020-10-19 19:05 ` Marc Kleine-Budde
  2020-10-19 20:35   ` Oliver Hartkopp
  2020-10-19 19:05 ` [net-rfc 05/16] can: dev: __can_get_echo_skb(): fix the returned length of CAN frame Marc Kleine-Budde
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 45+ messages in thread
From: Marc Kleine-Budde @ 2020-10-19 19:05 UTC (permalink / raw)
  To: linux-can; +Cc: kernel, Vincent Mailhol, Marc Kleine-Budde

From: Vincent Mailhol <mailhol.vincent@wanadoo.fr>

In classical CAN, the length of the data (i.e. CAN payload) is not
always equal to the DLC! If the frame is a Remote Transmission Request
(RTR), data length is always zero regardless of DLC value and else, if
the DLC is greater than 8, the length is 8. Contrary to common belief,
ISO 11898-1 Chapter 8.4.2.3 (DLC field) do allow DLCs greater than 8
for Classical Frames and specifies that those DLCs shall indicate that
the data field is 8 bytes long.

Above facts are widely unknown and so many developpers uses the "len"
field of "struct canfd_frame" to get the length of classical CAN
frames: this is incorrect!

This patch introduces function get_can_len() which can be used in
remediation. The function takes the SKB as an input in order to be
able to determine if the frame is classical or FD.

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20201002154219.4887-4-mailhol.vincent@wanadoo.fr
[mkl: renamed get_can_len() -> can_get_len()]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 include/linux/can/dev.h | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/include/linux/can/dev.h b/include/linux/can/dev.h
index 41ff31795320..2bb132fc6d88 100644
--- a/include/linux/can/dev.h
+++ b/include/linux/can/dev.h
@@ -192,6 +192,29 @@ u8 can_dlc2len(u8 can_dlc);
 /* map the sanitized data length to an appropriate data length code */
 u8 can_len2dlc(u8 len);
 
+/*
+ * can_get_len(skb) - get the length of the CAN payload.
+ *
+ * In classical CAN, the length of the data (i.e. CAN payload) is not
+ * always equal to the DLC! If the frame is a Remote Transmission
+ * Request (RTR), data length is always zero regardless of DLC value
+ * and else, if the DLC is greater than 8, the length is 8. Contrary
+ * to common belief, ISO 11898-1 Chapter 8.4.2.3 (DLC field) do allow
+ * DLCs greater than 8 for Classical Frames and specifies that those
+ * DLCs shall indicate that the data field is 8 bytes long.
+ */
+static inline u8 can_get_len(const struct sk_buff *skb)
+{
+	const struct canfd_frame *cf = (const struct canfd_frame *)skb->data;
+
+	if (can_is_canfd_skb(skb))
+		return min_t(u8, cf->len, CANFD_MAX_DLEN);
+	else if (cf->can_id & CAN_RTR_FLAG)
+		return 0;
+	else
+		return min_t(u8, cf->len, CAN_MAX_DLEN);
+}
+
 struct net_device *alloc_candev_mqs(int sizeof_priv, unsigned int echo_skb_max,
 				    unsigned int txqs, unsigned int rxqs);
 #define alloc_candev(sizeof_priv, echo_skb_max) \
-- 
2.28.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [net-rfc 05/16] can: dev: __can_get_echo_skb(): fix the returned length of CAN frame
  2020-10-19 19:05 [RFC]: can 2020-10-19 Marc Kleine-Budde
                   ` (3 preceding siblings ...)
  2020-10-19 19:05 ` [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames Marc Kleine-Budde
@ 2020-10-19 19:05 ` Marc Kleine-Budde
  2020-10-19 19:05 ` [net-rfc 06/16] can: can_create_echo_skb(): fix echo skb generation: always use skb_clone() Marc Kleine-Budde
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 45+ messages in thread
From: Marc Kleine-Budde @ 2020-10-19 19:05 UTC (permalink / raw)
  To: linux-can; +Cc: kernel, Vincent Mailhol, Marc Kleine-Budde

From: Vincent Mailhol <mailhol.vincent@wanadoo.fr>

The length of Remote Transmission Request (RTR) frames is always 0 bytes. The
DLC represents the requested length, not the actual length of the RTR. But
__can_get_echo_skb() returns the DLC value regardless.

Apply can_get_len() function to retrieve the correct length.

This patch depends on:

    7d4b1cbca3bd can: dev: can_get_len():  add a helper function to get the correct length of Classical frames

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20201002154219.4887-5-mailhol.vincent@wanadoo.fr
Fixes: cf5046b309b3 ("can: dev: let can_get_echo_skb() return dlc of CAN frame")
[mkl: renamed get_can_len() -> can_get_len()]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 drivers/net/can/dev.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
index 73cfcd7e9517..47b45a2223f9 100644
--- a/drivers/net/can/dev.c
+++ b/drivers/net/can/dev.c
@@ -507,14 +507,9 @@ __can_get_echo_skb(struct net_device *dev, unsigned int idx, u8 *len_ptr)
 	}
 
 	if (priv->echo_skb[idx]) {
-		/* Using "struct canfd_frame::len" for the frame
-		 * length is supported on both CAN and CANFD frames.
-		 */
 		struct sk_buff *skb = priv->echo_skb[idx];
-		struct canfd_frame *cf = (struct canfd_frame *)skb->data;
-		u8 len = cf->len;
 
-		*len_ptr = len;
+		*len_ptr = can_get_len(skb);
 		priv->echo_skb[idx] = NULL;
 
 		return skb;
-- 
2.28.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [net-rfc 06/16] can: can_create_echo_skb(): fix echo skb generation: always use skb_clone()
  2020-10-19 19:05 [RFC]: can 2020-10-19 Marc Kleine-Budde
                   ` (4 preceding siblings ...)
  2020-10-19 19:05 ` [net-rfc 05/16] can: dev: __can_get_echo_skb(): fix the returned length of CAN frame Marc Kleine-Budde
@ 2020-10-19 19:05 ` Marc Kleine-Budde
  2020-10-19 19:05 ` [net-rfc 07/16] can: j1939: j1939_sk_bind(): return failure if netdev is down Marc Kleine-Budde
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 45+ messages in thread
From: Marc Kleine-Budde @ 2020-10-19 19:05 UTC (permalink / raw)
  To: linux-can; +Cc: kernel, Oleksij Rempel, Oliver Hartkopp, Marc Kleine-Budde

From: Oleksij Rempel <o.rempel@pengutronix.de>

All user space generated SKBs are owned by a socket (unless injected into the
key via AF_PACKET). If a socket is closed, all associated skbs will be cleaned
up.

This leads to a problem when a CAN driver calls can_put_echo_skb() on a
unshared SKB. If the socket is closed prior to the TX complete handler,
can_get_echo_skb() and the subsequent delivering of the echo SKB to all
registered callbacks, a SKB with a refcount of 0 is delivered.

To avoid the problem, in can_get_echo_skb() the original SKB is now always
cloned, regardless of shared SKB or not. If the process exists it can now
safely discard its SKBs, without disturbing the delivery of the echo SKB.

The problem shows up in the j1939 stack, when it clones the incoming skb, which
detects the already 0 refcount.

We can easily reproduce this with following example:

testj1939 -B -r can0: &
cansend can0 1823ff40#0123

WARNING: CPU: 0 PID: 293 at lib/refcount.c:25 refcount_warn_saturate+0x108/0x174
refcount_t: addition on 0; use-after-free.
Modules linked in: coda_vpu imx_vdoa videobuf2_vmalloc dw_hdmi_ahb_audio vcan
CPU: 0 PID: 293 Comm: cansend Not tainted 5.5.0-rc6-00376-g9e20dcb7040d #1
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Backtrace:
[<c010f570>] (dump_backtrace) from [<c010f90c>] (show_stack+0x20/0x24)
[<c010f8ec>] (show_stack) from [<c0c3e1a4>] (dump_stack+0x8c/0xa0)
[<c0c3e118>] (dump_stack) from [<c0127fec>] (__warn+0xe0/0x108)
[<c0127f0c>] (__warn) from [<c01283c8>] (warn_slowpath_fmt+0xa8/0xcc)
[<c0128324>] (warn_slowpath_fmt) from [<c0539c0c>] (refcount_warn_saturate+0x108/0x174)
[<c0539b04>] (refcount_warn_saturate) from [<c0ad2cac>] (j1939_can_recv+0x20c/0x210)
[<c0ad2aa0>] (j1939_can_recv) from [<c0ac9dc8>] (can_rcv_filter+0xb4/0x268)
[<c0ac9d14>] (can_rcv_filter) from [<c0aca2cc>] (can_receive+0xb0/0xe4)
[<c0aca21c>] (can_receive) from [<c0aca348>] (can_rcv+0x48/0x98)
[<c0aca300>] (can_rcv) from [<c09b1fdc>] (__netif_receive_skb_one_core+0x64/0x88)
[<c09b1f78>] (__netif_receive_skb_one_core) from [<c09b2070>] (__netif_receive_skb+0x38/0x94)
[<c09b2038>] (__netif_receive_skb) from [<c09b2130>] (netif_receive_skb_internal+0x64/0xf8)
[<c09b20cc>] (netif_receive_skb_internal) from [<c09b21f8>] (netif_receive_skb+0x34/0x19c)
[<c09b21c4>] (netif_receive_skb) from [<c0791278>] (can_rx_offload_napi_poll+0x58/0xb4)

Fixes: 0ae89beb283a ("can: add destructor for self generated skbs")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: http://lore.kernel.org/r/20200124132656.22156-1-o.rempel@pengutronix.de
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 include/linux/can/skb.h | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/include/linux/can/skb.h b/include/linux/can/skb.h
index 900b9f4e0605..fc61cf4eff1c 100644
--- a/include/linux/can/skb.h
+++ b/include/linux/can/skb.h
@@ -61,21 +61,17 @@ static inline void can_skb_set_owner(struct sk_buff *skb, struct sock *sk)
  */
 static inline struct sk_buff *can_create_echo_skb(struct sk_buff *skb)
 {
-	if (skb_shared(skb)) {
-		struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
+	struct sk_buff *nskb;
 
-		if (likely(nskb)) {
-			can_skb_set_owner(nskb, skb->sk);
-			consume_skb(skb);
-			return nskb;
-		} else {
-			kfree_skb(skb);
-			return NULL;
-		}
+	nskb = skb_clone(skb, GFP_ATOMIC);
+	if (unlikely(!nskb)) {
+		kfree_skb(skb);
+		return NULL;
 	}
 
-	/* we can assume to have an unshared skb with proper owner */
-	return skb;
+	can_skb_set_owner(nskb, skb->sk);
+	consume_skb(skb);
+	return nskb;
 }
 
 #endif /* !_CAN_SKB_H */
-- 
2.28.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [net-rfc 07/16] can: j1939: j1939_sk_bind(): return failure if netdev is down
  2020-10-19 19:05 [RFC]: can 2020-10-19 Marc Kleine-Budde
                   ` (5 preceding siblings ...)
  2020-10-19 19:05 ` [net-rfc 06/16] can: can_create_echo_skb(): fix echo skb generation: always use skb_clone() Marc Kleine-Budde
@ 2020-10-19 19:05 ` Marc Kleine-Budde
  2020-10-19 19:05 ` [net-rfc 08/16] can: isotp: Explain PDU in CAN_ISOTP help text Marc Kleine-Budde
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 45+ messages in thread
From: Marc Kleine-Budde @ 2020-10-19 19:05 UTC (permalink / raw)
  To: linux-can; +Cc: kernel, Zhang Changzhong, Oleksij Rempel, Marc Kleine-Budde

From: Zhang Changzhong <zhangchangzhong@huawei.com>

When a netdev down event occurs after a successful call to
j1939_sk_bind(), j1939_netdev_notify() can handle it correctly.

But if the netdev already in down state before calling j1939_sk_bind(),
j1939_sk_release() will stay in wait_event_interruptible() blocked
forever. Because in this case, j1939_netdev_notify() won't be called and
j1939_tp_txtimer() won't call j1939_session_cancel() or other function
to clear session for ENETDOWN error, this lead to mismatch of
j1939_session_get/put() and jsk->skb_pending will never decrease to
zero.

To reproduce it use following commands:
1. ip link add dev vcan0 type vcan
2. j1939acd -r 100,80-120 1122334455667788 vcan0
3. presses ctrl-c and thread will be blocked forever

This patch adds check for ndev->flags in j1939_sk_bind() to avoid this
kind of situation and return with -ENETDOWN.

Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Link: https://lore.kernel.org/r/1599460308-18770-1-git-send-email-zhangchangzhong@huawei.com
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 net/can/j1939/socket.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/can/j1939/socket.c b/net/can/j1939/socket.c
index 1be4c898b2fa..f23966526a88 100644
--- a/net/can/j1939/socket.c
+++ b/net/can/j1939/socket.c
@@ -475,6 +475,12 @@ static int j1939_sk_bind(struct socket *sock, struct sockaddr *uaddr, int len)
 			goto out_release_sock;
 		}
 
+		if (!(ndev->flags & IFF_UP)) {
+			dev_put(ndev);
+			ret = -ENETDOWN;
+			goto out_release_sock;
+		}
+
 		priv = j1939_netdev_start(ndev);
 		dev_put(ndev);
 		if (IS_ERR(priv)) {
-- 
2.28.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [net-rfc 08/16] can: isotp: Explain PDU in CAN_ISOTP help text
  2020-10-19 19:05 [RFC]: can 2020-10-19 Marc Kleine-Budde
                   ` (6 preceding siblings ...)
  2020-10-19 19:05 ` [net-rfc 07/16] can: j1939: j1939_sk_bind(): return failure if netdev is down Marc Kleine-Budde
@ 2020-10-19 19:05 ` Marc Kleine-Budde
  2020-10-19 19:05 ` [net-rfc 09/16] can: isotp: enable RX timeout handling in listen-only mode Marc Kleine-Budde
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 45+ messages in thread
From: Marc Kleine-Budde @ 2020-10-19 19:05 UTC (permalink / raw)
  To: linux-can; +Cc: kernel, Geert Uytterhoeven, Oliver Hartkopp, Marc Kleine-Budde

From: Geert Uytterhoeven <geert+renesas@glider.be>

The help text for the CAN_ISOTP config symbol uses the acronym "PDU".  However,
this acronym is not explained here, nor in Documentation/networking/can.rst.

Expand the acronym to make it easier for users to decide if they need to enable
the CAN_ISOTP option or not.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20201013141341.28487-1-geert+renesas@glider.be
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 net/can/Kconfig | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/can/Kconfig b/net/can/Kconfig
index 224e5e0283a9..7c9958df91d3 100644
--- a/net/can/Kconfig
+++ b/net/can/Kconfig
@@ -62,8 +62,9 @@ config CAN_ISOTP
 	  communication between CAN nodes via two defined CAN Identifiers.
 	  As CAN frames can only transport a small amount of data bytes
 	  (max. 8 bytes for 'classic' CAN and max. 64 bytes for CAN FD) this
-	  segmentation is needed to transport longer PDUs as needed e.g. for
-	  vehicle diagnosis (UDS, ISO 14229) or IP-over-CAN traffic.
+	  segmentation is needed to transport longer Protocol Data Units (PDU)
+	  as needed e.g. for vehicle diagnosis (UDS, ISO 14229) or IP-over-CAN
+	  traffic.
 	  This protocol driver implements data transfers according to
 	  ISO 15765-2:2016 for 'classic' CAN and CAN FD frame types.
 	  If you want to perform automotive vehicle diagnostic services (UDS),
-- 
2.28.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [net-rfc 09/16] can: isotp: enable RX timeout handling in listen-only mode
  2020-10-19 19:05 [RFC]: can 2020-10-19 Marc Kleine-Budde
                   ` (7 preceding siblings ...)
  2020-10-19 19:05 ` [net-rfc 08/16] can: isotp: Explain PDU in CAN_ISOTP help text Marc Kleine-Budde
@ 2020-10-19 19:05 ` Marc Kleine-Budde
  2020-10-19 19:05 ` [net-rfc 10/16] can: ti_hecc: add missed clk_disable_unprepare() in error path Marc Kleine-Budde
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 45+ messages in thread
From: Marc Kleine-Budde @ 2020-10-19 19:05 UTC (permalink / raw)
  To: linux-can; +Cc: kernel, Oliver Hartkopp, Thomas Wagner, Marc Kleine-Budde

From: Oliver Hartkopp <socketcan@hartkopp.net>

As reported by Thomas Wagner:

    https://github.com/hartkopp/can-isotp/issues/34

the timeout handling for data frames is not enabled when the isotp socket is
used in listen-only mode (sockopt CAN_ISOTP_LISTEN_MODE). This mode is enabled
by the isotpsniffer application which therefore became inconsistend with the
strict rx timeout rules when running the isotp protocol in the operational
mode.

This patch fixes this inconsistency by moving the return condition for the
listen-only mode behind the timeout handling code.

Reported-by: Thomas Wagner <thwa1@web.de>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Link: https://github.com/hartkopp/can-isotp/issues/34
Link: https://lore.kernel.org/r/20201019120229.89326-1-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 net/can/isotp.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/can/isotp.c b/net/can/isotp.c
index 4c2062875893..a79287ef86da 100644
--- a/net/can/isotp.c
+++ b/net/can/isotp.c
@@ -569,10 +569,6 @@ static int isotp_rcv_cf(struct sock *sk, struct canfd_frame *cf, int ae,
 		return 0;
 	}
 
-	/* no creation of flow control frames */
-	if (so->opt.flags & CAN_ISOTP_LISTEN_MODE)
-		return 0;
-
 	/* perform blocksize handling, if enabled */
 	if (!so->rxfc.bs || ++so->rx.bs < so->rxfc.bs) {
 		/* start rx timeout watchdog */
@@ -581,6 +577,10 @@ static int isotp_rcv_cf(struct sock *sk, struct canfd_frame *cf, int ae,
 		return 0;
 	}
 
+	/* no creation of flow control frames */
+	if (so->opt.flags & CAN_ISOTP_LISTEN_MODE)
+		return 0;
+
 	/* we reached the specified blocksize so->rxfc.bs */
 	isotp_send_fc(sk, ae, ISOTP_FC_CTS);
 	return 0;
-- 
2.28.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [net-rfc 10/16] can: ti_hecc: add missed clk_disable_unprepare() in error path
  2020-10-19 19:05 [RFC]: can 2020-10-19 Marc Kleine-Budde
                   ` (8 preceding siblings ...)
  2020-10-19 19:05 ` [net-rfc 09/16] can: isotp: enable RX timeout handling in listen-only mode Marc Kleine-Budde
@ 2020-10-19 19:05 ` Marc Kleine-Budde
  2020-10-19 19:05 ` [net-rfc 11/16] can: xilinx_can: handle failure cases of pm_runtime_get_sync Marc Kleine-Budde
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 45+ messages in thread
From: Marc Kleine-Budde @ 2020-10-19 19:05 UTC (permalink / raw)
  To: linux-can; +Cc: kernel, Zhang Changzhong, Marc Kleine-Budde

From: Zhang Changzhong <zhangchangzhong@huawei.com>

The driver forgets to call clk_disable_unprepare() in error path after
a success calling for clk_prepare_enable().

Fix it by adding a clk_disable_unprepare() in error path.

Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Link: https://lore.kernel.org/r/1594973079-27743-1-git-send-email-zhangchangzhong@huawei.com
Fixes: befa60113ce7 ("can: ti_hecc: add missing prepare and unprepare of the clock")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 drivers/net/can/ti_hecc.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/can/ti_hecc.c b/drivers/net/can/ti_hecc.c
index 1d63006c97bc..9913f5458279 100644
--- a/drivers/net/can/ti_hecc.c
+++ b/drivers/net/can/ti_hecc.c
@@ -933,7 +933,7 @@ static int ti_hecc_probe(struct platform_device *pdev)
 	err = clk_prepare_enable(priv->clk);
 	if (err) {
 		dev_err(&pdev->dev, "clk_prepare_enable() failed\n");
-		goto probe_exit_clk;
+		goto probe_exit_release_clk;
 	}
 
 	priv->offload.mailbox_read = ti_hecc_mailbox_read;
@@ -942,7 +942,7 @@ static int ti_hecc_probe(struct platform_device *pdev)
 	err = can_rx_offload_add_timestamp(ndev, &priv->offload);
 	if (err) {
 		dev_err(&pdev->dev, "can_rx_offload_add_timestamp() failed\n");
-		goto probe_exit_clk;
+		goto probe_exit_disable_clk;
 	}
 
 	err = register_candev(ndev);
@@ -960,7 +960,9 @@ static int ti_hecc_probe(struct platform_device *pdev)
 
 probe_exit_offload:
 	can_rx_offload_del(&priv->offload);
-probe_exit_clk:
+probe_exit_disable_clk:
+	clk_disable_unprepare(priv->clk);
+probe_exit_release_clk:
 	clk_put(priv->clk);
 probe_exit_candev:
 	free_candev(ndev);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [net-rfc 11/16] can: xilinx_can: handle failure cases of pm_runtime_get_sync
  2020-10-19 19:05 [RFC]: can 2020-10-19 Marc Kleine-Budde
                   ` (9 preceding siblings ...)
  2020-10-19 19:05 ` [net-rfc 10/16] can: ti_hecc: add missed clk_disable_unprepare() in error path Marc Kleine-Budde
@ 2020-10-19 19:05 ` Marc Kleine-Budde
  2020-10-19 19:05 ` [net-rfc 12/16] can: peak_usb: fix timestamp wrapping Marc Kleine-Budde
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 45+ messages in thread
From: Marc Kleine-Budde @ 2020-10-19 19:05 UTC (permalink / raw)
  To: linux-can; +Cc: kernel, Navid Emamdoost, Marc Kleine-Budde

From: Navid Emamdoost <navid.emamdoost@gmail.com>

Calling pm_runtime_get_sync increments the counter even in case of
failure, causing incorrect ref count. Call pm_runtime_put if
pm_runtime_get_sync fails.

Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Link: https://lore.kernel.org/r/20200605033239.60664-1-navid.emamdoost@gmail.com
Fixes: 4716620d1b62 ("can: xilinx: Convert to runtime_pm")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 drivers/net/can/xilinx_can.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c
index 6c4d00d2dbdc..48d746e18f30 100644
--- a/drivers/net/can/xilinx_can.c
+++ b/drivers/net/can/xilinx_can.c
@@ -1395,7 +1395,7 @@ static int xcan_open(struct net_device *ndev)
 	if (ret < 0) {
 		netdev_err(ndev, "%s: pm_runtime_get failed(%d)\n",
 			   __func__, ret);
-		return ret;
+		goto err;
 	}
 
 	ret = request_irq(ndev->irq, xcan_interrupt, priv->irq_flags,
@@ -1479,6 +1479,7 @@ static int xcan_get_berr_counter(const struct net_device *ndev,
 	if (ret < 0) {
 		netdev_err(ndev, "%s: pm_runtime_get failed(%d)\n",
 			   __func__, ret);
+		pm_runtime_put(priv->dev);
 		return ret;
 	}
 
@@ -1793,7 +1794,7 @@ static int xcan_probe(struct platform_device *pdev)
 	if (ret < 0) {
 		netdev_err(ndev, "%s: pm_runtime_get failed(%d)\n",
 			   __func__, ret);
-		goto err_pmdisable;
+		goto err_disableclks;
 	}
 
 	if (priv->read_reg(priv, XCAN_SR_OFFSET) != XCAN_SR_CONFIG_MASK) {
@@ -1828,7 +1829,6 @@ static int xcan_probe(struct platform_device *pdev)
 
 err_disableclks:
 	pm_runtime_put(priv->dev);
-err_pmdisable:
 	pm_runtime_disable(&pdev->dev);
 err_free:
 	free_candev(ndev);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [net-rfc 12/16] can: peak_usb: fix timestamp wrapping
  2020-10-19 19:05 [RFC]: can 2020-10-19 Marc Kleine-Budde
                   ` (10 preceding siblings ...)
  2020-10-19 19:05 ` [net-rfc 11/16] can: xilinx_can: handle failure cases of pm_runtime_get_sync Marc Kleine-Budde
@ 2020-10-19 19:05 ` Marc Kleine-Budde
  2020-10-19 19:05 ` [net-rfc 13/16] can: peak_canfd: fix echo management when loopback is on Marc Kleine-Budde
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 45+ messages in thread
From: Marc Kleine-Budde @ 2020-10-19 19:05 UTC (permalink / raw)
  To: linux-can; +Cc: kernel, Stephane Grosjean, Marc Kleine-Budde

From: Stephane Grosjean <s.grosjean@peak-system.com>

Fabian Inostroza <fabianinostrozap@gmail.com> has discovered a potential
problem in the hardware timestamp reporting from the PCAN-USB USB CAN
interface (only), related to the fact that a timestamp of an event may
precede the timestamp used for synchronization when both records are part
of the same USB packet. However, this case was used to detect the wrapping
of the time counter.

This patch details and fixes the two identified cases where this problem
can occur.

Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
Link: https://lore.kernel.org/r/20201014085631.15128-1-s.grosjean@peak-system.com
Fixes: bb4785551f64 ("can: usb: PEAK-System Technik USB adapters driver core")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 drivers/net/can/usb/peak_usb/pcan_usb_core.c | 51 ++++++++++++++++++--
 1 file changed, 46 insertions(+), 5 deletions(-)

diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_core.c b/drivers/net/can/usb/peak_usb/pcan_usb_core.c
index d91df34e7fa8..c2764799f9ef 100644
--- a/drivers/net/can/usb/peak_usb/pcan_usb_core.c
+++ b/drivers/net/can/usb/peak_usb/pcan_usb_core.c
@@ -130,14 +130,55 @@ void peak_usb_get_ts_time(struct peak_time_ref *time_ref, u32 ts, ktime_t *time)
 	/* protect from getting time before setting now */
 	if (ktime_to_ns(time_ref->tv_host)) {
 		u64 delta_us;
+		s64 delta_ts = 0;
+
+		/* General case: dev_ts_1 < dev_ts_2 < ts, with:
+		 *
+		 * - dev_ts_1 = previous sync timestamp
+		 * - dev_ts_2 = last sync timestamp
+		 * - ts = event timestamp
+		 * - ts_period = known sync period (theoretical)
+		 *             ~ dev_ts2 - dev_ts1
+		 * *but*:
+		 *
+		 * - time counters wrap (see adapter->ts_used_bits)
+		 * - sometimes, dev_ts_1 < ts < dev_ts2
+		 *
+		 * "normal" case (sync time counters increase):
+		 * must take into account case when ts wraps (tsw)
+		 *
+		 *      < ts_period > <          >
+		 *     |             |            |
+		 *  ---+--------+----+-------0-+--+-->
+		 *     ts_dev_1 |    ts_dev_2  |
+		 *              ts             tsw
+		 */
+		if (time_ref->ts_dev_1 < time_ref->ts_dev_2) {
+			/* case when event time (tsw) wraps */
+			if (ts < time_ref->ts_dev_1)
+				delta_ts = 1 << time_ref->adapter->ts_used_bits;
+
+		/* Otherwise, sync time counter (ts_dev_2) has wrapped:
+		 * handle case when event time (tsn) hasn't.
+		 *
+		 *      < ts_period > <          >
+		 *     |             |            |
+		 *  ---+--------+--0-+---------+--+-->
+		 *     ts_dev_1 |    ts_dev_2  |
+		 *              tsn            ts
+		 */
+		} else if (time_ref->ts_dev_1 < ts) {
+			delta_ts = -(1 << time_ref->adapter->ts_used_bits);
+		}
 
-		delta_us = ts - time_ref->ts_dev_2;
-		if (ts < time_ref->ts_dev_2)
-			delta_us &= (1 << time_ref->adapter->ts_used_bits) - 1;
+		/* add delay between last sync and event timestamps */
+		delta_ts += (signed int)(ts - time_ref->ts_dev_2);
 
-		delta_us += time_ref->ts_total;
+		/* add time from beginning to last sync */
+		delta_ts += time_ref->ts_total;
 
-		delta_us *= time_ref->adapter->us_per_ts_scale;
+		/* convert ticks number into microseconds */
+		delta_us = delta_ts * time_ref->adapter->us_per_ts_scale;
 		delta_us >>= time_ref->adapter->us_per_ts_shift;
 
 		*time = ktime_add_us(time_ref->tv_host_0, delta_us);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [net-rfc 13/16] can: peak_canfd: fix echo management when loopback is on
  2020-10-19 19:05 [RFC]: can 2020-10-19 Marc Kleine-Budde
                   ` (11 preceding siblings ...)
  2020-10-19 19:05 ` [net-rfc 12/16] can: peak_usb: fix timestamp wrapping Marc Kleine-Budde
@ 2020-10-19 19:05 ` Marc Kleine-Budde
  2020-10-19 19:05 ` [net-rfc 14/16] can: mcp251xfd: mcp251xfd_regmap_crc_read(): increase severity of CRC read error messages Marc Kleine-Budde
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 45+ messages in thread
From: Marc Kleine-Budde @ 2020-10-19 19:05 UTC (permalink / raw)
  To: linux-can; +Cc: kernel, Stephane Grosjean, Marc Kleine-Budde

From: Stephane Grosjean <s.grosjean@peak-system.com>

Echo management is driven by PUCAN_MSG_LOOPED_BACK bit, while loopback
frames are identified with PUCAN_MSG_SELF_RECEIVE bit. Those bits are set
for each outgoing frame written to the IP core so that a copy of each one
will be placed into the rx path. Thus,

- when PUCAN_MSG_LOOPED_BACK is set then the rx frame is an echo of a
  previously sent frame,
- when PUCAN_MSG_LOOPED_BACK+PUCAN_MSG_SELF_RECEIVE are set, then the rx
  frame is an echo AND a loopback frame. Therefore, this frame must be
  put into the socket rx path too.

This patch fixes how CAN frames are handled when these are sent while the
can interface is configured in "loopback on" mode.

Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
Link: https://lore.kernel.org/r/20201013153947.28012-1-s.grosjean@peak-system.com
Fixes: 8ac8321e4a79 ("can: peak: add support for PEAK PCAN-PCIe FD CAN-FD boards")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 drivers/net/can/peak_canfd/peak_canfd.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/can/peak_canfd/peak_canfd.c b/drivers/net/can/peak_canfd/peak_canfd.c
index 10aa3e457c33..40c33b8a5fda 100644
--- a/drivers/net/can/peak_canfd/peak_canfd.c
+++ b/drivers/net/can/peak_canfd/peak_canfd.c
@@ -262,8 +262,7 @@ static int pucan_handle_can_rx(struct peak_canfd_priv *priv,
 		cf_len = get_can_dlc(pucan_msg_get_dlc(msg));
 
 	/* if this frame is an echo, */
-	if ((rx_msg_flags & PUCAN_MSG_LOOPED_BACK) &&
-	    !(rx_msg_flags & PUCAN_MSG_SELF_RECEIVE)) {
+	if (rx_msg_flags & PUCAN_MSG_LOOPED_BACK) {
 		unsigned long flags;
 
 		spin_lock_irqsave(&priv->echo_lock, flags);
@@ -277,7 +276,13 @@ static int pucan_handle_can_rx(struct peak_canfd_priv *priv,
 		netif_wake_queue(priv->ndev);
 
 		spin_unlock_irqrestore(&priv->echo_lock, flags);
-		return 0;
+
+		/* if this frame is only an echo, stop here. Otherwise,
+		 * continue to push this application self-received frame into
+		 * its own rx queue.
+		 */
+		if (!(rx_msg_flags & PUCAN_MSG_SELF_RECEIVE))
+			return 0;
 	}
 
 	/* otherwise, it should be pushed into rx fifo */
-- 
2.28.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [net-rfc 14/16] can: mcp251xfd: mcp251xfd_regmap_crc_read(): increase severity of CRC read error messages
  2020-10-19 19:05 [RFC]: can 2020-10-19 Marc Kleine-Budde
                   ` (12 preceding siblings ...)
  2020-10-19 19:05 ` [net-rfc 13/16] can: peak_canfd: fix echo management when loopback is on Marc Kleine-Budde
@ 2020-10-19 19:05 ` Marc Kleine-Budde
  2020-10-19 19:05 ` [net-rfc 15/16] can: mcp251xfd: mcp251xfd_regmap_nocrc_read(): fix semicolon.cocci warnings Marc Kleine-Budde
  2020-10-19 19:05 ` [net-rfc 16/16] can: mcp251xfd: remove unneeded break Marc Kleine-Budde
  15 siblings, 0 replies; 45+ messages in thread
From: Marc Kleine-Budde @ 2020-10-19 19:05 UTC (permalink / raw)
  To: linux-can; +Cc: kernel, Marc Kleine-Budde, Manivannan Sadhasivam, Thomas Kopp

During debugging it turned out that some people have setups where the SPI
communication is more prone to CRC errors.

Increase the severity of both the transfer retry and transfer failure message
to give users feedback without the need to recompile the driver with debug
enabled.

Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Cc: Thomas Kopp <thomas.kopp@microchip.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 drivers/net/can/spi/mcp251xfd/mcp251xfd-regmap.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-regmap.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-regmap.c
index ba25902dd78c..c9ffc5ea2b25 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-regmap.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-regmap.c
@@ -330,17 +330,17 @@ mcp251xfd_regmap_crc_read(void *context,
 			goto out;
 		}
 
-		netdev_dbg(priv->ndev,
-			   "CRC read error at address 0x%04x (length=%zd, data=%*ph, CRC=0x%04x) retrying.\n",
-			   reg, val_len, (int)val_len, buf_rx->data,
-			   get_unaligned_be16(buf_rx->data + val_len));
-	}
-
-	if (err) {
 		netdev_info(priv->ndev,
-			    "CRC read error at address 0x%04x (length=%zd, data=%*ph, CRC=0x%04x).\n",
+			    "CRC read error at address 0x%04x (length=%zd, data=%*ph, CRC=0x%04x) retrying.\n",
 			    reg, val_len, (int)val_len, buf_rx->data,
 			    get_unaligned_be16(buf_rx->data + val_len));
+	}
+
+	if (err) {
+		netdev_err(priv->ndev,
+			   "CRC read error at address 0x%04x (length=%zd, data=%*ph, CRC=0x%04x).\n",
+			   reg, val_len, (int)val_len, buf_rx->data,
+			   get_unaligned_be16(buf_rx->data + val_len));
 
 		return err;
 	}
-- 
2.28.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [net-rfc 15/16] can: mcp251xfd: mcp251xfd_regmap_nocrc_read(): fix semicolon.cocci warnings
  2020-10-19 19:05 [RFC]: can 2020-10-19 Marc Kleine-Budde
                   ` (13 preceding siblings ...)
  2020-10-19 19:05 ` [net-rfc 14/16] can: mcp251xfd: mcp251xfd_regmap_crc_read(): increase severity of CRC read error messages Marc Kleine-Budde
@ 2020-10-19 19:05 ` Marc Kleine-Budde
  2020-10-19 19:05 ` [net-rfc 16/16] can: mcp251xfd: remove unneeded break Marc Kleine-Budde
  15 siblings, 0 replies; 45+ messages in thread
From: Marc Kleine-Budde @ 2020-10-19 19:05 UTC (permalink / raw)
  To: linux-can; +Cc: kernel, kernel test robot, Marc Kleine-Budde

From: kernel test robot <lkp@intel.com>

drivers/net/can/spi/mcp251xfd/mcp251xfd-regmap.c:176:2-3: Unneeded semicolon

 Remove unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

Fixes: 875347fe5756 ("can: mcp25xxfd: add regmap infrastructure")
Signed-off-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/20201019120805.GA63693@ae4257e0ab22
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 drivers/net/can/spi/mcp251xfd/mcp251xfd-regmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-regmap.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-regmap.c
index c9ffc5ea2b25..314f868b3465 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-regmap.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-regmap.c
@@ -173,7 +173,7 @@ mcp251xfd_regmap_nocrc_read(void *context,
 		memcpy(&buf_tx->cmd, reg, sizeof(buf_tx->cmd));
 		if (MCP251XFD_SANITIZE_SPI)
 			memset(buf_tx->data, 0x0, val_len);
-	};
+	}
 
 	err = spi_sync(spi, &msg);
 	if (err)
-- 
2.28.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [net-rfc 16/16] can: mcp251xfd: remove unneeded break
  2020-10-19 19:05 [RFC]: can 2020-10-19 Marc Kleine-Budde
                   ` (14 preceding siblings ...)
  2020-10-19 19:05 ` [net-rfc 15/16] can: mcp251xfd: mcp251xfd_regmap_nocrc_read(): fix semicolon.cocci warnings Marc Kleine-Budde
@ 2020-10-19 19:05 ` Marc Kleine-Budde
  15 siblings, 0 replies; 45+ messages in thread
From: Marc Kleine-Budde @ 2020-10-19 19:05 UTC (permalink / raw)
  To: linux-can; +Cc: kernel, Tom Rix, Marc Kleine-Budde

From: Tom Rix <trix@redhat.com>

A break is not needed if it is preceded by a return.

Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20201019172412.31143-1-trix@redhat.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 .../net/can/spi/mcp251xfd/mcp251xfd-core.c    | 22 +++++++++----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
index c3f49543ff26..9c215f7c5f81 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
@@ -75,11 +75,11 @@ static const char *__mcp251xfd_get_model_str(enum mcp251xfd_model model)
 {
 	switch (model) {
 	case MCP251XFD_MODEL_MCP2517FD:
-		return "MCP2517FD"; break;
+		return "MCP2517FD";
 	case MCP251XFD_MODEL_MCP2518FD:
-		return "MCP2518FD"; break;
+		return "MCP2518FD";
 	case MCP251XFD_MODEL_MCP251XFD:
-		return "MCP251xFD"; break;
+		return "MCP251xFD";
 	}
 
 	return "<unknown>";
@@ -95,21 +95,21 @@ static const char *mcp251xfd_get_mode_str(const u8 mode)
 {
 	switch (mode) {
 	case MCP251XFD_REG_CON_MODE_MIXED:
-		return "Mixed (CAN FD/CAN 2.0)"; break;
+		return "Mixed (CAN FD/CAN 2.0)";
 	case MCP251XFD_REG_CON_MODE_SLEEP:
-		return "Sleep"; break;
+		return "Sleep";
 	case MCP251XFD_REG_CON_MODE_INT_LOOPBACK:
-		return "Internal Loopback"; break;
+		return "Internal Loopback";
 	case MCP251XFD_REG_CON_MODE_LISTENONLY:
-		return "Listen Only"; break;
+		return "Listen Only";
 	case MCP251XFD_REG_CON_MODE_CONFIG:
-		return "Configuration"; break;
+		return "Configuration";
 	case MCP251XFD_REG_CON_MODE_EXT_LOOPBACK:
-		return "External Loopback"; break;
+		return "External Loopback";
 	case MCP251XFD_REG_CON_MODE_CAN2_0:
-		return "CAN 2.0"; break;
+		return "CAN 2.0";
 	case MCP251XFD_REG_CON_MODE_RESTRICTED:
-		return "Restricted Operation"; break;
+		return "Restricted Operation";
 	}
 
 	return "<unknown>";
-- 
2.28.0


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames
  2020-10-19 19:05 ` [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames Marc Kleine-Budde
@ 2020-10-19 20:35   ` Oliver Hartkopp
  2020-10-20  6:35     ` Marc Kleine-Budde
  0 siblings, 1 reply; 45+ messages in thread
From: Oliver Hartkopp @ 2020-10-19 20:35 UTC (permalink / raw)
  To: Marc Kleine-Budde, linux-can; +Cc: kernel, Vincent Mailhol



On 19.10.20 21:05, Marc Kleine-Budde wrote:
> From: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
> 
> In classical CAN, the length of the data (i.e. CAN payload) is not
> always equal to the DLC! If the frame is a Remote Transmission Request
> (RTR), data length is always zero regardless of DLC value and else, if
> the DLC is greater than 8, the length is 8. Contrary to common belief,
> ISO 11898-1 Chapter 8.4.2.3 (DLC field) do allow DLCs greater than 8
> for Classical Frames and specifies that those DLCs shall indicate that
> the data field is 8 bytes long.
> 
> Above facts are widely unknown and so many developpers uses the "len"
> field of "struct canfd_frame" to get the length of classical CAN
> frames: this is incorrect!
> 
> This patch introduces function get_can_len() which can be used in
> remediation. The function takes the SKB as an input in order to be
> able to determine if the frame is classical or FD.
> 
> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
> Link: https://lore.kernel.org/r/20201002154219.4887-4-mailhol.vincent@wanadoo.fr
> [mkl: renamed get_can_len() -> can_get_len()]
> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
> ---
>   include/linux/can/dev.h | 23 +++++++++++++++++++++++
>   1 file changed, 23 insertions(+)
> 
> diff --git a/include/linux/can/dev.h b/include/linux/can/dev.h
> index 41ff31795320..2bb132fc6d88 100644
> --- a/include/linux/can/dev.h
> +++ b/include/linux/can/dev.h
> @@ -192,6 +192,29 @@ u8 can_dlc2len(u8 can_dlc);
>   /* map the sanitized data length to an appropriate data length code */
>   u8 can_len2dlc(u8 len);
>   
> +/*
> + * can_get_len(skb) - get the length of the CAN payload.
> + *
> + * In classical CAN, the length of the data (i.e. CAN payload) is not
> + * always equal to the DLC! If the frame is a Remote Transmission
> + * Request (RTR), data length is always zero regardless of DLC value
> + * and else, if the DLC is greater than 8, the length is 8. Contrary
> + * to common belief, ISO 11898-1 Chapter 8.4.2.3 (DLC field) do allow
> + * DLCs greater than 8 for Classical Frames and specifies that those
> + * DLCs shall indicate that the data field is 8 bytes long.
> + */
> +static inline u8 can_get_len(const struct sk_buff *skb)
> +{
> +	const struct canfd_frame *cf = (const struct canfd_frame *)skb->data;
> +
> +	if (can_is_canfd_skb(skb))
> +		return min_t(u8, cf->len, CANFD_MAX_DLEN);
> +	else if (cf->can_id & CAN_RTR_FLAG)
> +		return 0;
> +	else
> +		return min_t(u8, cf->len, CAN_MAX_DLEN);
> +}

The main idea behind this patch and patch 05/16 is to provide a correct 
statistic in the tx bytes, right?

A simple test for the CAN_RTR_FLAG will do the job as all the length 
sanitizing is already done in the tx path by can_dropped_invalid_skb() 
in all known drivers right *before* the skb is stored in the echo skb array.

IMO there's no need for a separate helper function. Maybe a macro which 
should have something with 'payload' in its name - to determine the tx 
byte statistics based on CAN_RTR_FLAG ...

Regards,
Oliver

> +
>   struct net_device *alloc_candev_mqs(int sizeof_priv, unsigned int echo_skb_max,
>   				    unsigned int txqs, unsigned int rxqs);
>   #define alloc_candev(sizeof_priv, echo_skb_max) \
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames
  2020-10-19 20:35   ` Oliver Hartkopp
@ 2020-10-20  6:35     ` Marc Kleine-Budde
  2020-10-20 11:30       ` Vincent Mailhol
  0 siblings, 1 reply; 45+ messages in thread
From: Marc Kleine-Budde @ 2020-10-20  6:35 UTC (permalink / raw)
  To: Oliver Hartkopp, linux-can; +Cc: Vincent Mailhol, kernel

[-- Attachment #1.1: Type: text/plain, Size: 3533 bytes --]

On 10/19/20 10:35 PM, Oliver Hartkopp wrote:
> 
> 
> On 19.10.20 21:05, Marc Kleine-Budde wrote:
>> From: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
>>
>> In classical CAN, the length of the data (i.e. CAN payload) is not
>> always equal to the DLC! If the frame is a Remote Transmission Request
>> (RTR), data length is always zero regardless of DLC value and else, if
>> the DLC is greater than 8, the length is 8. Contrary to common belief,
>> ISO 11898-1 Chapter 8.4.2.3 (DLC field) do allow DLCs greater than 8
>> for Classical Frames and specifies that those DLCs shall indicate that
>> the data field is 8 bytes long.
>>
>> Above facts are widely unknown and so many developpers uses the "len"
>> field of "struct canfd_frame" to get the length of classical CAN
>> frames: this is incorrect!
>>
>> This patch introduces function get_can_len() which can be used in
>> remediation. The function takes the SKB as an input in order to be
>> able to determine if the frame is classical or FD.
>>
>> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
>> Link: https://lore.kernel.org/r/20201002154219.4887-4-mailhol.vincent@wanadoo.fr
>> [mkl: renamed get_can_len() -> can_get_len()]
>> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
>> ---
>>   include/linux/can/dev.h | 23 +++++++++++++++++++++++
>>   1 file changed, 23 insertions(+)
>>
>> diff --git a/include/linux/can/dev.h b/include/linux/can/dev.h
>> index 41ff31795320..2bb132fc6d88 100644
>> --- a/include/linux/can/dev.h
>> +++ b/include/linux/can/dev.h
>> @@ -192,6 +192,29 @@ u8 can_dlc2len(u8 can_dlc);
>>   /* map the sanitized data length to an appropriate data length code */
>>   u8 can_len2dlc(u8 len);
>>   
>> +/*
>> + * can_get_len(skb) - get the length of the CAN payload.
>> + *
>> + * In classical CAN, the length of the data (i.e. CAN payload) is not
>> + * always equal to the DLC! If the frame is a Remote Transmission
>> + * Request (RTR), data length is always zero regardless of DLC value
>> + * and else, if the DLC is greater than 8, the length is 8. Contrary
>> + * to common belief, ISO 11898-1 Chapter 8.4.2.3 (DLC field) do allow
>> + * DLCs greater than 8 for Classical Frames and specifies that those
>> + * DLCs shall indicate that the data field is 8 bytes long.
>> + */
>> +static inline u8 can_get_len(const struct sk_buff *skb)
>> +{
>> +	const struct canfd_frame *cf = (const struct canfd_frame *)skb->data;
>> +
>> +	if (can_is_canfd_skb(skb))
>> +		return min_t(u8, cf->len, CANFD_MAX_DLEN);
>> +	else if (cf->can_id & CAN_RTR_FLAG)
>> +		return 0;
>> +	else
>> +		return min_t(u8, cf->len, CAN_MAX_DLEN);
>> +}
> 
> The main idea behind this patch and patch 05/16 is to provide a correct 
> statistic in the tx bytes, right?
> 
> A simple test for the CAN_RTR_FLAG will do the job as all the length 
> sanitizing is already done in the tx path by can_dropped_invalid_skb() 
> in all known drivers right *before* the skb is stored in the echo skb array.
> 
> IMO there's no need for a separate helper function. Maybe a macro which 
> should have something with 'payload' in its name - to determine the tx 
> byte statistics based on CAN_RTR_FLAG ...

Good point!

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames
  2020-10-20  6:35     ` Marc Kleine-Budde
@ 2020-10-20 11:30       ` Vincent Mailhol
  2020-10-20 11:48         ` Marc Kleine-Budde
  2020-10-20 12:38         ` Oliver Hartkopp
  0 siblings, 2 replies; 45+ messages in thread
From: Vincent Mailhol @ 2020-10-20 11:30 UTC (permalink / raw)
  To: Marc Kleine-Budde, Oliver Hartkopp, linux-can; +Cc: Vincent Mailhol, kernel

> On 19.10.20 21:05, Marc Kleine-Budde wrote:
> > From: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
> >
> > In classical CAN, the length of the data (i.e. CAN payload) is not
> > always equal to the DLC! If the frame is a Remote Transmission Request
> > (RTR), data length is always zero regardless of DLC value and else, if
> > the DLC is greater than 8, the length is 8. Contrary to common belief,
> > ISO 11898-1 Chapter 8.4.2.3 (DLC field) do allow DLCs greater than 8
> > for Classical Frames and specifies that those DLCs shall indicate that
> > the data field is 8 bytes long.
> >
> > Above facts are widely unknown and so many developpers uses the "len"
> > field of "struct canfd_frame" to get the length of classical CAN
> > frames: this is incorrect!
> >
> > This patch introduces function get_can_len() which can be used in
> > remediation. The function takes the SKB as an input in order to be
> > able to determine if the frame is classical or FD.
> >
> > Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
> > Link: https://lore.kernel.org/r/20201002154219.4887-4-mailhol.vincent@wanadoo.fr
> > [mkl: renamed get_can_len() -> can_get_len()]
> > Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
> > ---
> >   include/linux/can/dev.h | 23 +++++++++++++++++++++++
> >   1 file changed, 23 insertions(+)
> >
> > diff --git a/include/linux/can/dev.h b/include/linux/can/dev.h
> > index 41ff31795320..2bb132fc6d88 100644
> > --- a/include/linux/can/dev.h
> > +++ b/include/linux/can/dev.h
> > @@ -192,6 +192,29 @@ u8 can_dlc2len(u8 can_dlc);
> >   /* map the sanitized data length to an appropriate data length code */
> >   u8 can_len2dlc(u8 len);
> >   
> > +/*
> > + * can_get_len(skb) - get the length of the CAN payload.
> > + *
> > + * In classical CAN, the length of the data (i.e. CAN payload) is not
> > + * always equal to the DLC! If the frame is a Remote Transmission
> > + * Request (RTR), data length is always zero regardless of DLC value
> > + * and else, if the DLC is greater than 8, the length is 8. Contrary
> > + * to common belief, ISO 11898-1 Chapter 8.4.2.3 (DLC field) do allow
> > + * DLCs greater than 8 for Classical Frames and specifies that those
> > + * DLCs shall indicate that the data field is 8 bytes long.
> > + */
> > +static inline u8 can_get_len(const struct sk_buff *skb)
> > +{
> > +     const struct canfd_frame *cf = (const struct canfd_frame *)skb->data;
> > +
> > +     if (can_is_canfd_skb(skb))
> > +             return min_t(u8, cf->len, CANFD_MAX_DLEN);
> > +     else if (cf->can_id & CAN_RTR_FLAG)
> > +             return 0;
> > +     else
> > +             return min_t(u8, cf->len, CAN_MAX_DLEN);
> > +}
> 
> The main idea behind this patch and patch 05/16 is to provide a correct
> statistic in the tx bytes, right?
>
> A simple test for the CAN_RTR_FLAG will do the job as all the length
> sanitizing is already done in the tx path by can_dropped_invalid_skb()
> in all known drivers right *before* the skb is stored in the echo skb array.
>
> IMO there's no need for a separate helper function. Maybe a macro which
> should have something with 'payload' in its name - to determine the tx
> byte statistics based on CAN_RTR_FLAG ...

Actually, the main idea is not only to provide a valid length for the
tx statistics.

First fact is that many of the drivers (if not all?) will have the
same issue for the rx statistics as well, so this helper function can
be beneficial in other locations as well but that is not yet the main
point.

What bugs me the most, is that there is a global misunderstanding of
the definition of the Classical CAN frame's DLC in the kernel.

ISO 11898-1:2015 tells us in section 8.4.2.4 "DLC field" that, I
quote: "[...]  *This DLC shall consist of 4 bit*. The admissible
number of data bytes for Classical Frames shall range from 0 to
8. DLCs in the range of 0 to 7 shall indicate data fields of length of
0 byte to 7 byte. In Classical Frames, *all other DLCs shall indicate
that the data field is 8 bytes long*. [...]"

So the DLC is a 4 bits value (meaning from 0 to 15) and all values
from 8 to 15 designate a data length of 8.

The real idea is to have an ISO 11898-1 compliant function to retrieve
the length.

That said, I hear your comment that the DLC is sanitized in
can_dropped_invalid_skb(). However, what bothers me is that this
sanitazation is done on false assumptions: Classical frames with DLC
greater than 8 and lower or equal to 15, which are valid by the
standard, are being discarded.

To give you the full picture, I plan to send a RFC to fix this DLC
issue to allow Classical CAN frame with DLCs between 9 and 15 to be
sent and received in Socket CAN.  This can_get_len() function is one
piece of it, however, because I also found the bad RTR length issue, I
thought to introduce this one in advance. Sorry for the lack of
context.

The RFC I am thinking of will not be trivial. The main reason is that
the macro CAN_MAX_DLC which is incorrectly set to 8 instead of 15 is
exposed to the user land in include/linux/can.h. Modifying it would be
a no-go because we do not want to break user space. The direction
would be to have a socket option to set the maximum DLC to 15 on
demand and to keep it to 8 by default (this way, niche users who needs
this can do so but other people are not impacted).

One question you might ask is: "why should be allow DLC greater than
8?". One concrete use case is security testing. In order to check for
vulnerabilities, we want to send such frames for test coverage
(especially during fuzz testing). Aside of that I do not know other
use cases. I am not aware of any OEM that would use this in production
but I still want to push to have this option in the kernel just for
the security testing reason.

I hope that you now understand the full idea behind this patch. If you
agree with my comments, please reconsider adding that patch. If I
failed to convince you and if you prefer to first see the full
picture, then I am OK to go with the simple test for the CAN_RTR_FLAG
as you suggested in your other patch and will come back to you later
on the MAX DLC topic when ready.

Thanks for reading me until the end!


Yours sincerely,
Vincent Mailhol

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames
  2020-10-20 11:30       ` Vincent Mailhol
@ 2020-10-20 11:48         ` Marc Kleine-Budde
  2020-10-20 12:38         ` Oliver Hartkopp
  1 sibling, 0 replies; 45+ messages in thread
From: Marc Kleine-Budde @ 2020-10-20 11:48 UTC (permalink / raw)
  To: Vincent Mailhol, Oliver Hartkopp, linux-can; +Cc: kernel

[-- Attachment #1.1: Type: text/plain, Size: 6949 bytes --]

On 10/20/20 1:30 PM, Vincent Mailhol wrote:
>> On 19.10.20 21:05, Marc Kleine-Budde wrote:
>>> From: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
>>>
>>> In classical CAN, the length of the data (i.e. CAN payload) is not
>>> always equal to the DLC! If the frame is a Remote Transmission Request
>>> (RTR), data length is always zero regardless of DLC value and else, if
>>> the DLC is greater than 8, the length is 8. Contrary to common belief,
>>> ISO 11898-1 Chapter 8.4.2.3 (DLC field) do allow DLCs greater than 8
>>> for Classical Frames and specifies that those DLCs shall indicate that
>>> the data field is 8 bytes long.
>>>
>>> Above facts are widely unknown and so many developpers uses the "len"
>>> field of "struct canfd_frame" to get the length of classical CAN
>>> frames: this is incorrect!
>>>
>>> This patch introduces function get_can_len() which can be used in
>>> remediation. The function takes the SKB as an input in order to be
>>> able to determine if the frame is classical or FD.
>>>
>>> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
>>> Link: https://lore.kernel.org/r/20201002154219.4887-4-mailhol.vincent@wanadoo.fr
>>> [mkl: renamed get_can_len() -> can_get_len()]
>>> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
>>> ---
>>>   include/linux/can/dev.h | 23 +++++++++++++++++++++++
>>>   1 file changed, 23 insertions(+)
>>>
>>> diff --git a/include/linux/can/dev.h b/include/linux/can/dev.h
>>> index 41ff31795320..2bb132fc6d88 100644
>>> --- a/include/linux/can/dev.h
>>> +++ b/include/linux/can/dev.h
>>> @@ -192,6 +192,29 @@ u8 can_dlc2len(u8 can_dlc);
>>>   /* map the sanitized data length to an appropriate data length code */
>>>   u8 can_len2dlc(u8 len);
>>>   
>>> +/*
>>> + * can_get_len(skb) - get the length of the CAN payload.
>>> + *
>>> + * In classical CAN, the length of the data (i.e. CAN payload) is not
>>> + * always equal to the DLC! If the frame is a Remote Transmission
>>> + * Request (RTR), data length is always zero regardless of DLC value
>>> + * and else, if the DLC is greater than 8, the length is 8. Contrary
>>> + * to common belief, ISO 11898-1 Chapter 8.4.2.3 (DLC field) do allow
>>> + * DLCs greater than 8 for Classical Frames and specifies that those
>>> + * DLCs shall indicate that the data field is 8 bytes long.
>>> + */
>>> +static inline u8 can_get_len(const struct sk_buff *skb)
>>> +{
>>> +     const struct canfd_frame *cf = (const struct canfd_frame *)skb->data;
>>> +
>>> +     if (can_is_canfd_skb(skb))
>>> +             return min_t(u8, cf->len, CANFD_MAX_DLEN);
>>> +     else if (cf->can_id & CAN_RTR_FLAG)
>>> +             return 0;
>>> +     else
>>> +             return min_t(u8, cf->len, CAN_MAX_DLEN);
>>> +}
>>
>> The main idea behind this patch and patch 05/16 is to provide a correct
>> statistic in the tx bytes, right?
>>
>> A simple test for the CAN_RTR_FLAG will do the job as all the length
>> sanitizing is already done in the tx path by can_dropped_invalid_skb()
>> in all known drivers right *before* the skb is stored in the echo skb array.
>>
>> IMO there's no need for a separate helper function. Maybe a macro which
>> should have something with 'payload' in its name - to determine the tx
>> byte statistics based on CAN_RTR_FLAG ...
> 
> Actually, the main idea is not only to provide a valid length for the
> tx statistics.
> 
> First fact is that many of the drivers (if not all?) will have the
> same issue for the rx statistics as well, so this helper function can
> be beneficial in other locations as well but that is not yet the main
> point.
> 
> What bugs me the most, is that there is a global misunderstanding of
> the definition of the Classical CAN frame's DLC in the kernel.
> 
> ISO 11898-1:2015 tells us in section 8.4.2.4 "DLC field" that, I
> quote: "[...]  *This DLC shall consist of 4 bit*. The admissible
> number of data bytes for Classical Frames shall range from 0 to
> 8. DLCs in the range of 0 to 7 shall indicate data fields of length of
> 0 byte to 7 byte. In Classical Frames, *all other DLCs shall indicate
> that the data field is 8 bytes long*. [...]"
> 
> So the DLC is a 4 bits value (meaning from 0 to 15) and all values
> from 8 to 15 designate a data length of 8.
> 
> The real idea is to have an ISO 11898-1 compliant function to retrieve
> the length.
> 
> That said, I hear your comment that the DLC is sanitized in
> can_dropped_invalid_skb(). However, what bothers me is that this
> sanitazation is done on false assumptions: Classical frames with DLC
> greater than 8 and lower or equal to 15, which are valid by the
> standard, are being discarded.
> 
> To give you the full picture, I plan to send a RFC to fix this DLC
> issue to allow Classical CAN frame with DLCs between 9 and 15 to be
> sent and received in Socket CAN.  This can_get_len() function is one
> piece of it, however, because I also found the bad RTR length issue, I
> thought to introduce this one in advance. Sorry for the lack of
> context.
> 
> The RFC I am thinking of will not be trivial. The main reason is that
> the macro CAN_MAX_DLC which is incorrectly set to 8 instead of 15 is
> exposed to the user land in include/linux/can.h. Modifying it would be
> a no-go because we do not want to break user space. The direction
> would be to have a socket option to set the maximum DLC to 15 on
> demand and to keep it to 8 by default (this way, niche users who needs
> this can do so but other people are not impacted).
> 
> One question you might ask is: "why should be allow DLC greater than
> 8?". One concrete use case is security testing. In order to check for
> vulnerabilities, we want to send such frames for test coverage
> (especially during fuzz testing). Aside of that I do not know other
> use cases. I am not aware of any OEM that would use this in production
> but I still want to push to have this option in the kernel just for
> the security testing reason.
> 
> I hope that you now understand the full idea behind this patch. If you
> agree with my comments, please reconsider adding that patch. If I
> failed to convince you and if you prefer to first see the full
> picture, then I am OK to go with the simple test for the CAN_RTR_FLAG
> as you suggested in your other patch and will come back to you later
> on the MAX DLC topic when ready.
> 
> Thanks for reading me until the end!

Ok, I see your point. I'll keep Oliver's patch on linux-can/testing.

If you want to improve the stack toward more 11898-1:2015 complience please make
that a dedicated series.

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames
  2020-10-20 11:30       ` Vincent Mailhol
  2020-10-20 11:48         ` Marc Kleine-Budde
@ 2020-10-20 12:38         ` Oliver Hartkopp
  2020-10-20 15:02           ` Marc Kleine-Budde
  2020-10-20 16:07           ` Vincent Mailhol
  1 sibling, 2 replies; 45+ messages in thread
From: Oliver Hartkopp @ 2020-10-20 12:38 UTC (permalink / raw)
  To: Vincent Mailhol, Marc Kleine-Budde, linux-can; +Cc: kernel



On 20.10.20 13:30, Vincent Mailhol wrote:
>> On 19.10.20 21:05, Marc Kleine-Budde wrote:
>>> From: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
>>>
>>> In classical CAN, the length of the data (i.e. CAN payload) is not
>>> always equal to the DLC! If the frame is a Remote Transmission Request
>>> (RTR), data length is always zero regardless of DLC value and else, if
>>> the DLC is greater than 8, the length is 8. Contrary to common belief,
>>> ISO 11898-1 Chapter 8.4.2.3 (DLC field) do allow DLCs greater than 8
>>> for Classical Frames and specifies that those DLCs shall indicate that
>>> the data field is 8 bytes long.
>>>
>>> Above facts are widely unknown and so many developpers uses the "len"
>>> field of "struct canfd_frame" to get the length of classical CAN
>>> frames: this is incorrect!

No it is not. This is intentional.

When implementing the CAN FD support we mainly fixed the problem that 
the DLC (data length *CODE*) and the plain data length have often been 
used as a synonym by the programmers.

As the can_dlc value inside the Linux world is always 0..8 it was in 
fact always a 'length in byte' value. And it was safe to use it in this way.

When CAN FD showed up we decided NOT to provide the DLC value but a 
nominal 'len' value 0..64 which allows to maintain the existing access 
patterns (e.g. with a for() statement).

See:

https://wiki.automotivelinux.org/_media/agl-distro/agl2017-socketcan-print.pdf

page 56ff

>>>
>>> This patch introduces function get_can_len() which can be used in
>>> remediation. The function takes the SKB as an input in order to be
>>> able to determine if the frame is classical or FD.
>>>
>>> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
>>> Link: https://lore.kernel.org/r/20201002154219.4887-4-mailhol.vincent@wanadoo.fr
>>> [mkl: renamed get_can_len() -> can_get_len()]
>>> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
>>> ---
>>>    include/linux/can/dev.h | 23 +++++++++++++++++++++++
>>>    1 file changed, 23 insertions(+)
>>>
>>> diff --git a/include/linux/can/dev.h b/include/linux/can/dev.h
>>> index 41ff31795320..2bb132fc6d88 100644
>>> --- a/include/linux/can/dev.h
>>> +++ b/include/linux/can/dev.h
>>> @@ -192,6 +192,29 @@ u8 can_dlc2len(u8 can_dlc);
>>>    /* map the sanitized data length to an appropriate data length code */
>>>    u8 can_len2dlc(u8 len);
>>>    
>>> +/*
>>> + * can_get_len(skb) - get the length of the CAN payload.
>>> + *
>>> + * In classical CAN, the length of the data (i.e. CAN payload) is not
>>> + * always equal to the DLC! If the frame is a Remote Transmission
>>> + * Request (RTR), data length is always zero regardless of DLC value
>>> + * and else, if the DLC is greater than 8, the length is 8. Contrary
>>> + * to common belief, ISO 11898-1 Chapter 8.4.2.3 (DLC field) do allow
>>> + * DLCs greater than 8 for Classical Frames and specifies that those
>>> + * DLCs shall indicate that the data field is 8 bytes long.
>>> + */
>>> +static inline u8 can_get_len(const struct sk_buff *skb)
>>> +{
>>> +     const struct canfd_frame *cf = (const struct canfd_frame *)skb->data;
>>> +
>>> +     if (can_is_canfd_skb(skb))
>>> +             return min_t(u8, cf->len, CANFD_MAX_DLEN);
>>> +     else if (cf->can_id & CAN_RTR_FLAG)
>>> +             return 0;
>>> +     else
>>> +             return min_t(u8, cf->len, CAN_MAX_DLEN);
>>> +}
>>
>> The main idea behind this patch and patch 05/16 is to provide a correct
>> statistic in the tx bytes, right?
>>
>> A simple test for the CAN_RTR_FLAG will do the job as all the length
>> sanitizing is already done in the tx path by can_dropped_invalid_skb()
>> in all known drivers right *before* the skb is stored in the echo skb array.
>>
>> IMO there's no need for a separate helper function. Maybe a macro which
>> should have something with 'payload' in its name - to determine the tx
>> byte statistics based on CAN_RTR_FLAG ...
> 
> Actually, the main idea is not only to provide a valid length for the
> tx statistics.
> 
> First fact is that many of the drivers (if not all?) will have the
> same issue for the rx statistics as well, so this helper function can
> be beneficial in other locations as well but that is not yet the main
> point.

Many drivers do not use the the return value of can_get_echo_skb() - and 
yes, THAT should be fixed.


> What bugs me the most, is that there is a global misunderstanding of
> the definition of the Classical CAN frame's DLC in the kernel.

No it is not (as described above).

There is a clear separation that the DLC is only available in the driver 
for putting the DLC into the CAN controller and retrieving it from there.

Everything else (starting from the skb content) is a plain length 
information.

> ISO 11898-1:2015 tells us in section 8.4.2.4 "DLC field" that, I
> quote: "[...]  *This DLC shall consist of 4 bit*. The admissible
> number of data bytes for Classical Frames shall range from 0 to
> 8. DLCs in the range of 0 to 7 shall indicate data fields of length of
> 0 byte to 7 byte. In Classical Frames, *all other DLCs shall indicate
> that the data field is 8 bytes long*. [...]"
> 
> So the DLC is a 4 bits value (meaning from 0 to 15) and all values
> from 8 to 15 designate a data length of 8.
> 
> The real idea is to have an ISO 11898-1 compliant function to retrieve
> the length.

This is done on driver level using several helper functions like:

https://elixir.bootlin.com/linux/latest/source/include/linux/can/dev.h#L92

get_can_dlc()
get_canfd_dlc()

and conversions like

https://elixir.bootlin.com/linux/latest/source/include/linux/can/dev.h#L174

can_dlc2len()
can_len2dlc()


> 
> That said, I hear your comment that the DLC is sanitized in
> can_dropped_invalid_skb(). However, what bothers me is that this
> sanitazation is done on false assumptions: Classical frames with DLC
> greater than 8 and lower or equal to 15, which are valid by the
> standard, are being discarded.

Yes. But we don't talk about a DLC value here. We talk about a plain 
length value.

> To give you the full picture, I plan to send a RFC to fix this DLC
> issue to allow Classical CAN frame with DLCs between 9 and 15 to be
> sent and received in Socket CAN.

Ehm - no. This is a pointless feature to allow values that have no 
effect and might only happen on the wire (not even on the CAN controller 
register level).

The standard only defined how the CAN controller should react when it 
gets these out-of-range values from the wire. The values 9 - 15 are not 
distinguishable. Everything means 8.

>  This can_get_len() function is one
> piece of it, however, because I also found the bad RTR length issue, I
> thought to introduce this one in advance. Sorry for the lack of
> context.
> 
> The RFC I am thinking of will not be trivial. The main reason is that
> the macro CAN_MAX_DLC which is incorrectly set to 8 instead of 15 is
> exposed to the user land in include/linux/can.h. Modifying it would be
> a no-go because we do not want to break user space. The direction
> would be to have a socket option to set the maximum DLC to 15 on
> demand and to keep it to 8 by default (this way, niche users who needs
> this can do so but other people are not impacted).
> 
> One question you might ask is: "why should be allow DLC greater than
> 8?". One concrete use case is security testing. In order to check for
> vulnerabilities, we want to send such frames for test coverage
> (especially during fuzz testing). Aside of that I do not know other
> use cases. I am not aware of any OEM that would use this in production
> but I still want to push to have this option in the kernel just for
> the security testing reason.

??? Sorry but no.

You can create a DLC of 13 for Classical CAN. And your CAN controller 
gets 8 bytes from it and says to the network layer: Here is a CAN frame 
with 8 bytes payload.

Have you ever read out a CAN controller register and found a value 
greater than 8 on the register of the DLC value?

I believe you didn't. The CAN controller sees the DLC but provides 8 
bytes as DLC=8.

> I hope that you now understand the full idea behind this patch. If you
> agree with my comments, please reconsider adding that patch.

I don't believe it would be a good idea.

> If I
> failed to convince you and if you prefer to first see the full
> picture, then I am OK to go with the simple test for the CAN_RTR_FLAG
> as you suggested in your other patch and will come back to you later
> on the MAX DLC topic when ready.

I hope I was able to convince you :-)

> Thanks for reading me until the end!

I did my very best.

Best regards,
Oliver

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames
  2020-10-20 12:38         ` Oliver Hartkopp
@ 2020-10-20 15:02           ` Marc Kleine-Budde
  2020-10-20 16:07           ` Vincent Mailhol
  1 sibling, 0 replies; 45+ messages in thread
From: Marc Kleine-Budde @ 2020-10-20 15:02 UTC (permalink / raw)
  To: Oliver Hartkopp, Vincent Mailhol, linux-can; +Cc: kernel

[-- Attachment #1.1: Type: text/plain, Size: 1263 bytes --]

On 10/20/20 2:38 PM, Oliver Hartkopp wrote:
[...]

> You can create a DLC of 13 for Classical CAN. And your CAN controller 
> gets 8 bytes from it and says to the network layer: Here is a CAN frame 
> with 8 bytes payload.
> 
> Have you ever read out a CAN controller register and found a value 
> greater than 8 on the register of the DLC value?
> 
> I believe you didn't. The CAN controller sees the DLC but provides 8 
> bytes as DLC=8.

I just checked on the flexcan (mx6) and the mcp2518fd. I hardcoded a TX-dlc of
12 in both drivers and printed the raw dlc value in the RX-path:

> Oct 20 16:46:08 riot kernel: flexcan_mailbox_read: dlc=12
> Oct 20 16:46:10 riot kernel: mcp251xfd_hw_rx_obj_to_skb: dlc=12

The userspace shows a consistent len=8.

If we want to support CAN-2.0 with DLC > 8, we have to audit the TX-path every
driver and remove the can_dropped_invalid_skb() check.

I'm not sure what to do about the RX, with breaking user space ABI.

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames
  2020-10-20 12:38         ` Oliver Hartkopp
  2020-10-20 15:02           ` Marc Kleine-Budde
@ 2020-10-20 16:07           ` Vincent Mailhol
  2020-10-20 17:04             ` Oliver Hartkopp
  1 sibling, 1 reply; 45+ messages in thread
From: Vincent Mailhol @ 2020-10-20 16:07 UTC (permalink / raw)
  To: Marc Kleine-Budde, Oliver Hartkopp, linux-can; +Cc: Vincent Mailhol, kernel

> On 20.10.20 13:30, Vincent Mailhol wrote:
>>> On 19.10.20 21:05, Marc Kleine-Budde wrote:
>>>> From: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
>>>>
>>>> In classical CAN, the length of the data (i.e. CAN payload) is not
>>>> always equal to the DLC! If the frame is a Remote Transmission Request
>>>> (RTR), data length is always zero regardless of DLC value and else, if
>>>> the DLC is greater than 8, the length is 8. Contrary to common belief,
>>>> ISO 11898-1 Chapter 8.4.2.3 (DLC field) do allow DLCs greater than 8
>>>> for Classical Frames and specifies that those DLCs shall indicate that
>>>> the data field is 8 bytes long.
>>>>
>>>> Above facts are widely unknown and so many developpers uses the "len"
>>>> field of "struct canfd_frame" to get the length of classical CAN
>>>> frames: this is incorrect!
>
> No it is not. This is intentional.
>
> When implementing the CAN FD support we mainly fixed the problem that 
> the DLC (data length *CODE*) and the plain data length have often been 
> used as a synonym by the programmers.
>
> As the can_dlc value inside the Linux world is always 0..8 it was in 
> fact always a 'length in byte' value. And it was safe to use it in this way.
>
> When CAN FD showed up we decided NOT to provide the DLC value but a 
> nominal 'len' value 0..64 which allows to maintain the existing access 
> patterns (e.g. with a for() statement).
>
> See:
>
> https://wiki.automotivelinux.org/_media/agl-distro/agl2017-socketcan-print.pdf
>
> page 56ff
>
>>>>
>>>> This patch introduces function get_can_len() which can be used in
>>>> remediation. The function takes the SKB as an input in order to be
>>>> able to determine if the frame is classical or FD.
>>>>
>>>> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
>>>> Link: https://lore.kernel.org/r/20201002154219.4887-4-mailhol.vincent@wanadoo.fr
>>>> [mkl: renamed get_can_len() -> can_get_len()]
>>>> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
>>>> ---
>>>>    include/linux/can/dev.h | 23 +++++++++++++++++++++++
>>>>    1 file changed, 23 insertions(+)
>>>>
>>>> diff --git a/include/linux/can/dev.h b/include/linux/can/dev.h
>>>> index 41ff31795320..2bb132fc6d88 100644
>>>> --- a/include/linux/can/dev.h
>>>> +++ b/include/linux/can/dev.h
>>>> @@ -192,6 +192,29 @@ u8 can_dlc2len(u8 can_dlc);
>>>>    /* map the sanitized data length to an appropriate data length code */
>>>>    u8 can_len2dlc(u8 len);
>>>>    
>>>> +/*
>>>> + * can_get_len(skb) - get the length of the CAN payload.
>>>> + *
>>>> + * In classical CAN, the length of the data (i.e. CAN payload) is not
>>>> + * always equal to the DLC! If the frame is a Remote Transmission
>>>> + * Request (RTR), data length is always zero regardless of DLC value
>>>> + * and else, if the DLC is greater than 8, the length is 8. Contrary
>>>> + * to common belief, ISO 11898-1 Chapter 8.4.2.3 (DLC field) do allow
>>>> + * DLCs greater than 8 for Classical Frames and specifies that those
>>>> + * DLCs shall indicate that the data field is 8 bytes long.
>>>> + */
>>>> +static inline u8 can_get_len(const struct sk_buff *skb)
>>>> +{
>>>> +     const struct canfd_frame *cf = (const struct canfd_frame *)skb->data;
>>>> +
>>>> +     if (can_is_canfd_skb(skb))
>>>> +             return min_t(u8, cf->len, CANFD_MAX_DLEN);
>>>> +     else if (cf->can_id & CAN_RTR_FLAG)
>>>> +             return 0;
>>>> +     else
>>>> +             return min_t(u8, cf->len, CAN_MAX_DLEN);
>>>> +}
>>>
>>> The main idea behind this patch and patch 05/16 is to provide a correct
>>> statistic in the tx bytes, right?
>>>
>>> A simple test for the CAN_RTR_FLAG will do the job as all the length
>>> sanitizing is already done in the tx path by can_dropped_invalid_skb()
>>> in all known drivers right *before* the skb is stored in the echo skb array.
>>>
>>> IMO there's no need for a separate helper function. Maybe a macro which
>>> should have something with 'payload' in its name - to determine the tx
>>> byte statistics based on CAN_RTR_FLAG ...
>> 
>> Actually, the main idea is not only to provide a valid length for the
>> tx statistics.
>> 
>> First fact is that many of the drivers (if not all?) will have the
>> same issue for the rx statistics as well, so this helper function can
>> be beneficial in other locations as well but that is not yet the main
>> point.
>
> Many drivers do not use the the return value of can_get_echo_skb() - and 
> yes, THAT should be fixed.
>
>
>> What bugs me the most, is that there is a global misunderstanding of
>> the definition of the Classical CAN frame's DLC in the kernel.
>
> No it is not (as described above).
>
> There is a clear separation that the DLC is only available in the driver 
> for putting the DLC into the CAN controller and retrieving it from there.
>
> Everything else (starting from the skb content) is a plain length 
> information.
>
>> ISO 11898-1:2015 tells us in section 8.4.2.4 "DLC field" that, I
>> quote: "[...]  *This DLC shall consist of 4 bit*. The admissible
>> number of data bytes for Classical Frames shall range from 0 to
>> 8. DLCs in the range of 0 to 7 shall indicate data fields of length of
>> 0 byte to 7 byte. In Classical Frames, *all other DLCs shall indicate
>> that the data field is 8 bytes long*. [...]"
>> 
>> So the DLC is a 4 bits value (meaning from 0 to 15) and all values
>> from 8 to 15 designate a data length of 8.
>> 
>> The real idea is to have an ISO 11898-1 compliant function to retrieve
>> the length.
>
> This is done on driver level using several helper functions like:
>
> https://elixir.bootlin.com/linux/latest/source/include/linux/can/dev.h#L92
>
> get_can_dlc()
> get_canfd_dlc()
>
> and conversions like
>
> https://elixir.bootlin.com/linux/latest/source/include/linux/can/dev.h#L174
>
> can_dlc2len()
> can_len2dlc()
>
>
>> 
>> That said, I hear your comment that the DLC is sanitized in
>> can_dropped_invalid_skb(). However, what bothers me is that this
>> sanitazation is done on false assumptions: Classical frames with DLC
>> greater than 8 and lower or equal to 15, which are valid by the
>> standard, are being discarded.
>
> Yes. But we don't talk about a DLC value here. We talk about a plain 
> length value.
>
>> To give you the full picture, I plan to send a RFC to fix this DLC
>> issue to allow Classical CAN frame with DLCs between 9 and 15 to be
>> sent and received in Socket CAN.
>
> Ehm - no. This is a pointless feature to allow values that have no 
> effect and might only happen on the wire (not even on the CAN controller 
> register level).
>
> The standard only defined how the CAN controller should react when it 
> gets these out-of-range values from the wire. The values 9 - 15 are not 
> distinguishable. Everything means 8.
>
>>  This can_get_len() function is one
>> piece of it, however, because I also found the bad RTR length issue, I
>> thought to introduce this one in advance. Sorry for the lack of
>> context.
>> 
>> The RFC I am thinking of will not be trivial. The main reason is that
>> the macro CAN_MAX_DLC which is incorrectly set to 8 instead of 15 is
>> exposed to the user land in include/linux/can.h. Modifying it would be
>> a no-go because we do not want to break user space. The direction
>> would be to have a socket option to set the maximum DLC to 15 on
>> demand and to keep it to 8 by default (this way, niche users who needs
>> this can do so but other people are not impacted).
>> 
>> One question you might ask is: "why should be allow DLC greater than
>> 8?". One concrete use case is security testing. In order to check for
>> vulnerabilities, we want to send such frames for test coverage
>> (especially during fuzz testing). Aside of that I do not know other
>> use cases. I am not aware of any OEM that would use this in production
>> but I still want to push to have this option in the kernel just for
>> the security testing reason.
>
> ??? Sorry but no.
>
> You can create a DLC of 13 for Classical CAN. And your CAN controller 
> gets 8 bytes from it and says to the network layer: Here is a CAN frame 
> with 8 bytes payload.
>
> Have you ever read out a CAN controller register and found a value 
> greater than 8 on the register of the DLC value?
>
> I believe you didn't. The CAN controller sees the DLC but provides 8 
> bytes as DLC=8.

Then you believed wrong :-)

To back up my claim, I can quote the M_CAN specification from Bosch:

https://www.bosch-semiconductors.com/media/ip_modules/pdf_2/m_can/mcan_users_manual_v330.pdf

In section 2.4.2 "Rx Buffer and FIFO Element" it is written, I quote:

R1A/B Bits 19:16 DLC[3:0]: Data Length Code
	0-8=CAN + CAN FD: received frame has 0-8 data bytes
	9-15=CAN: *received frame has 8 data bytes*
	9-15=CAN FD: received frame has 12/16/20/24/32/48/64 data bytes

So here, you have an example of a DLC register of a classical CAN RX
frame which can contains values from 9 to 15.

I also did the test. I can send a CAN with a DLC of 13 on one
controller and the other ones correctly received a frame of 8 bytes
with a DLC of 13.

The ISO 11898-1 does not differentiate between the DLC on the wire,
the DLC in the microcontroller's register or the DLC at the
application level. The text I quoted holds for all scenarios in the
same way as the CAN ID and the data are the same on the wire and at
the application level (excluding bit stuffing of course). If you can
provide a citation from the ISO that contradicts this, I would
immediately recognize being wrong.

After, I am not saying that absolutely all the controllers will allow
DLC greater than 8. I would not be surprised to see some controllers
attempting to do some sanitization (which would violate the ISO) and
maybe you did your testing on such controllers. Only thing I can tell
is that all the controllers I studied allowed it (I can give more
examples upon request).

As for security testing, I worked as a security consultant in the
automotive industry for the last three years and with our colleagues,
we witnessed some ECUs that would completely stop responding after
receiving some DLCs greater than 8 due to some buffer overflow. This
is a real thing which can be found in production, I think it would be
great to be able to test that using socket CAN.

Some professional tools such as the CAN testing suite of Defensics by
Synopsys also include these kind of tests. Because Socket CAN does not
support this, Synopsys actually recommends to use the proprietary
drivers from the Peak controller which do allow this (unfortunately,
the Defensics documentation is not available publicly so I can not
give you a link to support my claim on that last example).

>> I hope that you now understand the full idea behind this patch. If you
>> agree with my comments, please reconsider adding that patch.
>
> I don't believe it would be a good idea.
>
>> If I
>> failed to convince you and if you prefer to first see the full
>> picture, then I am OK to go with the simple test for the CAN_RTR_FLAG
>> as you suggested in your other patch and will come back to you later
>> on the MAX DLC topic when ready.
> 
>I hope I was able to convince you :-)

Unfortunately not but I appreciate that you spent your time to answer
me in such details :-)

I hope that I could highlight in this answer that I am more than just
a hobbyist who got exited after ready the ISO and that I know this
subject. What I explain here is well known in the niche community of
automotive security researcher but outside of it I just think that
people are not aware of it.

>> Thanks for reading me until the end!
>
> I did my very best.

So did I :-)


Yours sincerely,
Vincent Mailhol

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames
  2020-10-20 16:07           ` Vincent Mailhol
@ 2020-10-20 17:04             ` Oliver Hartkopp
  2020-10-20 18:50               ` Marc Kleine-Budde
  2020-10-21  0:52               ` Vincent Mailhol
  0 siblings, 2 replies; 45+ messages in thread
From: Oliver Hartkopp @ 2020-10-20 17:04 UTC (permalink / raw)
  To: Vincent Mailhol, Marc Kleine-Budde, linux-can
  Cc: kernel, Stéphane Grosjean

On 20.10.20 18:07, Vincent Mailhol wrote:

> I also did the test. I can send a CAN with a DLC of 13 on one
> controller and the other ones correctly received a frame of 8 bytes
> with a DLC of 13.

o_O

You see me perplexed ...

> After, I am not saying that absolutely all the controllers will allow
> DLC greater than 8. I would not be surprised to see some controllers
> attempting to do some sanitization (which would violate the ISO) and
> maybe you did your testing on such controllers. Only thing I can tell
> is that all the controllers I studied allowed it (I can give more
> examples upon request).

I believe you.

> As for security testing, I worked as a security consultant in the
> automotive industry for the last three years and with our colleagues,
> we witnessed some ECUs that would completely stop responding after
> receiving some DLCs greater than 8 due to some buffer overflow. This
> is a real thing which can be found in production, I think it would be
> great to be able to test that using socket CAN.

Yes. That's a valid use-case. Many people are testing CAN setups based 
on SocketCAN. So getting every aspect of CAN available is needed to be 
able to provide a real OSS solution.

> Some professional tools such as the CAN testing suite of Defensics by
> Synopsys also include these kind of tests. Because Socket CAN does not
> support this, Synopsys actually recommends to use the proprietary
> drivers from the Peak controller which do allow this (unfortunately,
> the Defensics documentation is not available publicly so I can not
> give you a link to support my claim on that last example).

Stephane from PEAK is working on the Linux driver (Mainline Linux & PEAK 
chardev), so I put him on CC. Or are you referring to the Windows driver?

> I hope that I could highlight in this answer that I am more than just
> a hobbyist who got exited after ready the ISO and that I know this
> subject. What I explain here is well known in the niche community of
> automotive security researcher but outside of it I just think that
> people are not aware of it.

Well I have done a lot in automotive CAN security too - with message 
authentication and with CAN IDS - but this DLC thing was still new to me ...

 From a first thought I would see a new flag CAN_CTRLMODE_RAW_DLC in the 
netlink interface of IFLA_CAN_CTRLMODE for the CAN controller driver.

This could switch the sanitizing AND the CAN controller can properly 
expose its ability to support this mode.

I think I have to pick a beer and look at some code ... :-)

Best regards,
Oliver

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames
  2020-10-20 17:04             ` Oliver Hartkopp
@ 2020-10-20 18:50               ` Marc Kleine-Budde
  2020-10-21  0:52               ` Vincent Mailhol
  1 sibling, 0 replies; 45+ messages in thread
From: Marc Kleine-Budde @ 2020-10-20 18:50 UTC (permalink / raw)
  To: Oliver Hartkopp, Vincent Mailhol, linux-can
  Cc: Stéphane Grosjean, kernel

[-- Attachment #1.1: Type: text/plain, Size: 701 bytes --]

On 10/20/20 7:04 PM, Oliver Hartkopp wrote:
> On 20.10.20 18:07, Vincent Mailhol wrote:
> 
>> I also did the test. I can send a CAN with a DLC of 13 on one
>> controller and the other ones correctly received a frame of 8 bytes
>> with a DLC of 13.
> 
> o_O
> 
> You see me perplexed ...

[...]

> I think I have to pick a beer and look at some code ... :-)

Vincent, you successfully nerd-sniped Oliver!

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames
  2020-10-20 17:04             ` Oliver Hartkopp
  2020-10-20 18:50               ` Marc Kleine-Budde
@ 2020-10-21  0:52               ` Vincent Mailhol
  2020-10-21  6:23                 ` Vincent MAILHOL
  1 sibling, 1 reply; 45+ messages in thread
From: Vincent Mailhol @ 2020-10-21  0:52 UTC (permalink / raw)
  To: Marc Kleine-Budde, Oliver Hartkopp, linux-can
  Cc: Vincent Mailhol, kernel, Stéphane Grosjean

>> Some professional tools such as the CAN testing suite of Defensics by
>> Synopsys also include these kind of tests. Because Socket CAN does not
>> support this, Synopsys actually recommends to use the proprietary
>> drivers from the Peak controller which do allow this (unfortunately,
>> the Defensics documentation is not available publicly so I can not
>> give you a link to support my claim on that last example).
> 
> Stephane from PEAK is working on the Linux driver (Mainline Linux & PEAK 
> chardev), so I put him on CC. Or are you referring to the Windows driver?

Good question. As far as I remember, the PEAK Windows driver and the
PEAK Linux char driver shares the same API and as such Defensics
supports both through a python middleware (the injector). This is only
true for Classical CAN. For CAN-FD, only Windows driver are officially
supported but hacking Defensics's injector code to have it run on
Linux is easily done (but that not the topic so let me end the
digression here).

>> I hope that I could highlight in this answer that I am more than just
>> a hobbyist who got exited after ready the ISO and that I know this
>> subject. What I explain here is well known in the niche community of
>> automotive security researcher but outside of it I just think that
>> people are not aware of it.
> 
> Well I have done a lot in automotive CAN security too - with message 
> authentication and with CAN IDS - but this DLC thing was still new to me ...

I also worked on CAN IDS. I was Japan regional FAE for the CycurIDS
product of Escrypt. Speaking of that, I believe you probably know Jan
Holle.

>  From a first thought I would see a new flag CAN_CTRLMODE_RAW_DLC in the 
> netlink interface of IFLA_CAN_CTRLMODE for the CAN controller driver.
> 
> This could switch the sanitizing AND the CAN controller can properly 
> expose its ability to support this mode.

Absolutely yes. In my first message, I mentioned the idea of managing
that through socket option, glad that we now share the same idea.

An other option I thought about would be to use one of the reserved
field of the struct can_frame but I am not fan of that second idea.

Do you want me to come back with an RFC patch? I already started to
thing about that months ago and did some testing. I identified the
code portion which have to be modified.

The only thing is that it will take me some time to draft a nice
proposal. Currently, I have another patch under review for a CAN
driver (https://lkml.org/lkml/2020/10/16/832), I want to finalize this
one first, and can get back to you after that. So that will mean that
we will be targeting Linux 5.11 end of this year, I do not think we
can squeeze that change in 5.10 merging windows.

> I think I have to pick a beer and look at some code ... :-)

Have a nice one! That might have been a shock for you :-)


Yours sincerely,
Vincent Mailhol

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames
  2020-10-21  0:52               ` Vincent Mailhol
@ 2020-10-21  6:23                 ` Vincent MAILHOL
  2020-10-21  7:11                   ` Joakim Zhang
  2020-10-21  9:21                   ` Oliver Hartkopp
  0 siblings, 2 replies; 45+ messages in thread
From: Vincent MAILHOL @ 2020-10-21  6:23 UTC (permalink / raw)
  To: Marc Kleine-Budde, Oliver Hartkopp, linux-can
  Cc: kernel, Vincent Mailhol, Stéphane Grosjean

> > From a first thought I would see a new flag CAN_CTRLMODE_RAW_DLC in the
> > netlink interface of IFLA_CAN_CTRLMODE for the CAN controller driver.
> >
> > This could switch the sanitizing AND the CAN controller can properly
> > expose its ability to support this mode.
>
> Absolutely yes. In my first message, I mentioned the idea of managing
> that through socket option, glad that we now share the same idea.

Actually, I just realized that I replied to you too quickly. I was not
exactly thinking of the same thing here so let me correct what I
previously said.

IFLA_CAN_CTRLMODE is at the netlink level. My idea is to have it, in
addition, at the socket level. Example: add CAN_RAW_RAW_DLC in
include/uapi/linux/can/raw.h.

The reason is that if we only manage it at the netlink level, some
application not aware of the RAW_DLC issue might run into some buffer
overflow issue. Unless an application directly requests it, the current
behaviour should be maintained (rationale: do not break userland).

So the full picture will be to have both the CAN_CTRLMODE_RAW_DLC at
netlink level and CAN_RAW_RAW_DLC at the socket level (in the exact same
way we have both CAN_CTRLMODE_FD and CAN_RAW_FD_FRAMES for
CAN-FD).


Yours sincerely,
Vincent Mailhol

^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames
  2020-10-21  6:23                 ` Vincent MAILHOL
@ 2020-10-21  7:11                   ` Joakim Zhang
  2020-10-21  7:21                     ` Marc Kleine-Budde
  2020-10-21  9:21                   ` Oliver Hartkopp
  1 sibling, 1 reply; 45+ messages in thread
From: Joakim Zhang @ 2020-10-21  7:11 UTC (permalink / raw)
  To: Vincent MAILHOL, Marc Kleine-Budde, Oliver Hartkopp, linux-can
  Cc: kernel, Stéphane Grosjean


> -----Original Message-----
> From: Vincent MAILHOL <mailhol.vincent@wanadoo.fr>
> Sent: 2020年10月21日 14:24
> To: Marc Kleine-Budde <mkl@pengutronix.de>; Oliver Hartkopp
> <socketcan@hartkopp.net>; linux-can@vger.kernel.org
> Cc: kernel@pengutronix.de; Vincent Mailhol <mailhol.vincent@wanadoo.fr>;
> Stéphane Grosjean <s.grosjean@peak-system.com>
> Subject: Re: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to
> get the correct length of Classical frames
> 
> > > From a first thought I would see a new flag CAN_CTRLMODE_RAW_DLC in
> > > the netlink interface of IFLA_CAN_CTRLMODE for the CAN controller driver.
> > >
> > > This could switch the sanitizing AND the CAN controller can properly
> > > expose its ability to support this mode.
> >
> > Absolutely yes. In my first message, I mentioned the idea of managing
> > that through socket option, glad that we now share the same idea.
> 
> Actually, I just realized that I replied to you too quickly. I was not exactly
> thinking of the same thing here so let me correct what I previously said.
> 
> IFLA_CAN_CTRLMODE is at the netlink level. My idea is to have it, in addition,
> at the socket level. Example: add CAN_RAW_RAW_DLC in
> include/uapi/linux/can/raw.h.
> 
> The reason is that if we only manage it at the netlink level, some application
> not aware of the RAW_DLC issue might run into some buffer overflow issue.
> Unless an application directly requests it, the current behaviour should be
> maintained (rationale: do not break userland).

Hi Vincent Mailhol,

I wonder if it's appropriate to ask this question here, why this RAW_DLC issue might run into some buffer overflow issue? Will it cause frames dropped finally?

Best Regards,
Joakim Zhang
> So the full picture will be to have both the CAN_CTRLMODE_RAW_DLC at
> netlink level and CAN_RAW_RAW_DLC at the socket level (in the exact same
> way we have both CAN_CTRLMODE_FD and CAN_RAW_FD_FRAMES for
> CAN-FD).
> 
> 
> Yours sincerely,
> Vincent Mailhol

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames
  2020-10-21  7:11                   ` Joakim Zhang
@ 2020-10-21  7:21                     ` Marc Kleine-Budde
  2020-10-21  7:48                       ` Joakim Zhang
  0 siblings, 1 reply; 45+ messages in thread
From: Marc Kleine-Budde @ 2020-10-21  7:21 UTC (permalink / raw)
  To: Joakim Zhang, Vincent MAILHOL, Oliver Hartkopp, linux-can
  Cc: kernel, Stéphane Grosjean

[-- Attachment #1.1: Type: text/plain, Size: 856 bytes --]

On 10/21/20 9:11 AM, Joakim Zhang wrote:
> I wonder if it's appropriate to ask this question here, why this RAW_DLC
> issue might run into some buffer overflow issue? Will it cause frames dropped
> finally?

On the wire the dlc for CAN-2.0 frames ca go from 0...15. In the RX-path the DLC
for CAN-2.0 frames is capped to 8.

So userspace only ever sees a max dlc of 8. This way you can loop over the
struct can_frame::data. If we pass the dlc of the wire uncapped, you may loop
until 15 :)

In the TX-path we currently drop CAN-2.0 frames with dlc >8.

regards,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames
  2020-10-21  7:21                     ` Marc Kleine-Budde
@ 2020-10-21  7:48                       ` Joakim Zhang
  0 siblings, 0 replies; 45+ messages in thread
From: Joakim Zhang @ 2020-10-21  7:48 UTC (permalink / raw)
  To: Marc Kleine-Budde, Vincent MAILHOL, Oliver Hartkopp, linux-can
  Cc: kernel, Stéphane Grosjean


> -----Original Message-----
> From: Marc Kleine-Budde <mkl@pengutronix.de>
> Sent: 2020年10月21日 15:21
> To: Joakim Zhang <qiangqing.zhang@nxp.com>; Vincent MAILHOL
> <mailhol.vincent@wanadoo.fr>; Oliver Hartkopp <socketcan@hartkopp.net>;
> linux-can@vger.kernel.org
> Cc: kernel@pengutronix.de; Stéphane Grosjean
> <s.grosjean@peak-system.com>
> Subject: Re: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to
> get the correct length of Classical frames
> 
> On 10/21/20 9:11 AM, Joakim Zhang wrote:
> > I wonder if it's appropriate to ask this question here, why this
> > RAW_DLC issue might run into some buffer overflow issue? Will it cause
> > frames dropped finally?
> 
> On the wire the dlc for CAN-2.0 frames ca go from 0...15. In the RX-path the
> DLC for CAN-2.0 frames is capped to 8.
> 
> So userspace only ever sees a max dlc of 8. This way you can loop over the
> struct can_frame::data. If we pass the dlc of the wire uncapped, you may loop
> until 15 :)
> 
> In the TX-path we currently drop CAN-2.0 frames with dlc >8.

Thanks Marc, now I understand the meaning of "run into some buffer overflow". 😊

Best Regards,
Joakim Zhang
> regards,
> Marc
> 
> --
> Pengutronix e.K.                 | Marc Kleine-Budde           |
> Embedded Linux                   | https://www.pengutronix.de  |
> Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
> Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames
  2020-10-21  6:23                 ` Vincent MAILHOL
  2020-10-21  7:11                   ` Joakim Zhang
@ 2020-10-21  9:21                   ` Oliver Hartkopp
  2020-10-21  9:48                     ` Oliver Hartkopp
  2020-10-21 11:55                     ` Vincent MAILHOL
  1 sibling, 2 replies; 45+ messages in thread
From: Oliver Hartkopp @ 2020-10-21  9:21 UTC (permalink / raw)
  To: Vincent MAILHOL, Marc Kleine-Budde, linux-can
  Cc: kernel, Stéphane Grosjean

Hi Vincent,

On 21.10.20 08:23, Vincent MAILHOL wrote:
>>>  From a first thought I would see a new flag CAN_CTRLMODE_RAW_DLC in the
>>> netlink interface of IFLA_CAN_CTRLMODE for the CAN controller driver.
>>>
>>> This could switch the sanitizing AND the CAN controller can properly
>>> expose its ability to support this mode.
>>
>> Absolutely yes. In my first message, I mentioned the idea of managing
>> that through socket option, glad that we now share the same idea.
> 
> Actually, I just realized that I replied to you too quickly. I was not
> exactly thinking of the same thing here so let me correct what I
> previously said.
> 
> IFLA_CAN_CTRLMODE is at the netlink level. My idea is to have it, in
> addition, at the socket level. Example: add CAN_RAW_RAW_DLC in
> include/uapi/linux/can/raw.h.

Yes. We need at least some different handling inside the driver level 
which could be switched with CAN_CTRLMODE_RAW_DLC.

> The reason is that if we only manage it at the netlink level, some
> application not aware of the RAW_DLC issue might run into some buffer
> overflow issue. Unless an application directly requests it, the current
> behaviour should be maintained (rationale: do not break userland).
> 
> So the full picture will be to have both the CAN_CTRLMODE_RAW_DLC at
> netlink level and CAN_RAW_RAW_DLC at the socket level (in the exact same
> way we have both CAN_CTRLMODE_FD and CAN_RAW_FD_FRAMES for
> CAN-FD).

Yes. That hits the point.

Btw. the impact on all protocols seems to be pretty heavy and to me it 
turned out that it would be a bad idea to use can_frame.can_dlc as 
transport vehicle for the raw dlc. Especially as this will contradict 
the rule that the can_dlc element is a plain length information today as 
canfd_frame.len and shares the same position in the struct.

I currently tend to only have a switch on driver level with 
CAN_CTRLMODE_RAW_DLC and make use of can_frame._res0 -> can_frame.raw_dlc

With CAN_CTRLMODE_RAW_DLC enabled the CAN driver would ...

- fill can_dlc and raw_dlc at reception time
- will use raw_dlc instead of can_dlc at tx time
- will use can_dlc if raw_dlc == 0 at tx time

This would have a minimal impact on the code and we only need to make 
sure that the raw_dlc is not killed at some stage in the network layer.

#define CAN_MAX_RAW_DLC 15

IMO the raw dlc use-case is a really special case for testing purposes. 
Touching the current can_frame.can_dlc handling could lead to more 
complexity and the fear to run into overflows as already stated by Joakim.

What do you think about the above approach?

Regards,
Oliver

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames
  2020-10-21  9:21                   ` Oliver Hartkopp
@ 2020-10-21  9:48                     ` Oliver Hartkopp
  2020-10-21 11:55                     ` Vincent MAILHOL
  1 sibling, 0 replies; 45+ messages in thread
From: Oliver Hartkopp @ 2020-10-21  9:48 UTC (permalink / raw)
  To: Vincent MAILHOL, Marc Kleine-Budde, linux-can
  Cc: kernel, Stéphane Grosjean

Answering myself ...

On 21.10.20 11:21, Oliver Hartkopp wrote:

> With CAN_CTRLMODE_RAW_DLC enabled the CAN driver would ...
> 
> - fill can_dlc and raw_dlc at reception time
> - will use raw_dlc instead of can_dlc at tx time
> - will use can_dlc if raw_dlc == 0 at tx time

To be more compatible to non raw dlc CAN frame sources the tx handling 
could also be like this:

if ((can_dlc == CAN_MAX_DLEN) &&
     (raw_dlc >= CAN_MAX_DLC && raw_dlc <= CAN_MAX_RAW_DLC))
=> use raw_dlc



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames
  2020-10-21  9:21                   ` Oliver Hartkopp
  2020-10-21  9:48                     ` Oliver Hartkopp
@ 2020-10-21 11:55                     ` Vincent MAILHOL
  2020-10-21 17:52                       ` Oliver Hartkopp
  1 sibling, 1 reply; 45+ messages in thread
From: Vincent MAILHOL @ 2020-10-21 11:55 UTC (permalink / raw)
  To: Oliver Hartkopp, Marc Kleine-Budde
  Cc: linux-can, kernel, Stéphane Grosjean, Vincent Mailhol

> On 21.10.20 18:48, Oliver Hartkopp wrote:
>> On 21.10.20 08:23, Vincent MAILHOL wrote:
>>>>>  From a first thought I would see a new flag CAN_CTRLMODE_RAW_DLC in the
>>>>> netlink interface of IFLA_CAN_CTRLMODE for the CAN controller driver.
>>>>>
>>>>> This could switch the sanitizing AND the CAN controller can properly
>>>>> expose its ability to support this mode.
>>>>
>>>> Absolutely yes. In my first message, I mentioned the idea of managing
>>>> that through socket option, glad that we now share the same idea.
>>>
>>> Actually, I just realized that I replied to you too quickly. I was not
>>> exactly thinking of the same thing here so let me correct what I
>>> previously said.
>>>
>>> IFLA_CAN_CTRLMODE is at the netlink level. My idea is to have it, in
>>> addition, at the socket level. Example: add CAN_RAW_RAW_DLC in
>>> include/uapi/linux/can/raw.h.
>>
>> Yes. We need at least some different handling inside the driver level
>> which could be switched with CAN_CTRLMODE_RAW_DLC.
>>
>>> The reason is that if we only manage it at the netlink level, some
>>> application not aware of the RAW_DLC issue might run into some buffer
>>> overflow issue. Unless an application directly requests it, the current
>>> behaviour should be maintained (rationale: do not break userland).
>>>
>>> So the full picture will be to have both the CAN_CTRLMODE_RAW_DLC at
>>> netlink level and CAN_RAW_RAW_DLC at the socket level (in the exact same
>>> way we have both CAN_CTRLMODE_FD and CAN_RAW_FD_FRAMES for
>>> CAN-FD).
>>
>> Yes. That hits the point.
>>
>> Btw. the impact on all protocols seems to be pretty heavy and to me it
>> turned out that it would be a bad idea to use can_frame.can_dlc as
>> transport vehicle for the raw dlc. Especially as this will contradict
>> the rule that the can_dlc element is a plain length information today as
>> canfd_frame.len and shares the same position in the struct.
>>
>> I currently tend to only have a switch on driver level with
>> CAN_CTRLMODE_RAW_DLC and make use of can_frame._res0 -> can_frame.raw_dlc
>>
>> With CAN_CTRLMODE_RAW_DLC enabled the CAN driver would ...
>>
>> - fill can_dlc and raw_dlc at reception time
>> - will use raw_dlc instead of can_dlc at tx time
>> - will use can_dlc if raw_dlc == 0 at tx time
>
> To be more compatible to non raw dlc CAN frame sources the tx handling
> could also be like this:
>
> if ((can_dlc == CAN_MAX_DLEN) &&
>      (raw_dlc >= CAN_MAX_DLC && raw_dlc <= CAN_MAX_RAW_DLC))
> => use raw_dlc
>
>>
>> This would have a minimal impact on the code and we only need to make
>> sure that the raw_dlc is not killed at some stage in the network layer.
>>
>> #define CAN_MAX_RAW_DLC 15
>>
>> IMO the raw dlc use-case is a really special case for testing purposes.
>> Touching the current can_frame.can_dlc handling could lead to more
>> complexity and the fear to run into overflows as already stated by Joakim.
>>
>> What do you think about the above approach?

If I understand well, the idea would be not to use a setsockopt() but
instead rely on some logic on the can_dlc and raw_dlc to determine
which one to use.

Unfortunately, this approach has one issue in the TX path that could
break existing applications.

Consider below code (which I think is fairly realistic):

void send_frame (int soc, canid_t can_id, char *data, size_t len)
{
    struct can_frame cf;
    size_t dlc = len > sizeof(cf.data) ? sizeof(cf.data) : len;

    cf.can_id = can_id;
    cf.can_dlc = dlc;
    memcpy(cf.data, data, dlc);

    write(soc, &cf, sizeof(cf));
}

Here, the user did not initialize the can frame (cf) but assigned all
the relevant fields manually. Because cf is not initialized, the newly
introduced cf.dlc_raw field would have any of the values which was
present on the stack at the moment (rationale: the C standard does not
guarantee zero initialization). If 9 <= raw_dlc <= 15, the can frame
will be transmitted with a bad DLC value. If raw_dlc > 15, the can
frame will be discarded.

I think that it is mandatory to have the application declare that it
wants to use raw DLCs (this way, we know that the code is "DLC
aware"). I can not think of any transparent approach.

Next, we might think of using the netlink CAN_CTRLMODE_RAW_DLC and
the CAN_RAW_RAW_DLC socket option and the raw_dlc field. But I think
that this is overkill.

If not introducing new dlc_raw field, the drawback, as you pointed
out, will be that we would lose the backward compatibility of
canfd_frame with can_frame and that can_dlc field can not be used
directly as a length.

For userland, I think that this is acceptable because the very instant
the user calls setsockopt() with the CAN_RAW_RAW_DLC, he should be
aware of the consequences and should resign to use can_dlc field as a
plain length. That of course means that this should be clearly
highlighted when updating the documentation. And users not interested
by this feature can continue to use it as they did before.

Inside the kernel, all drivers advertising CAN_CTRLMODE_RAW_DLC will
also resign the right to use can_dlc as plain length. Drivers not
using it are safe with their existing code. Finally, the TX and RX
path would both need to be inspected in detail.

TLDR; socket options seem mandatory in all cases. We then have to
choose between breaking the can_dlc plain length property or
introducing a new raw_dlc field. My choice goes to the first option of
breaking the can_dlc plain length property.

That said, I am curious about what other people think.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames
  2020-10-21 11:55                     ` Vincent MAILHOL
@ 2020-10-21 17:52                       ` Oliver Hartkopp
  2020-10-22  3:30                         ` Vincent MAILHOL
  0 siblings, 1 reply; 45+ messages in thread
From: Oliver Hartkopp @ 2020-10-21 17:52 UTC (permalink / raw)
  To: Vincent MAILHOL, Marc Kleine-Budde
  Cc: linux-can, kernel, Stéphane Grosjean

On 21.10.20 13:55, Vincent MAILHOL wrote:
>> On 21.10.20 18:48, Oliver Hartkopp wrote:

>> To be more compatible to non raw dlc CAN frame sources the tx handling
>> could also be like this:
>>
>> if ((can_dlc == CAN_MAX_DLEN) &&
>>       (raw_dlc >= CAN_MAX_DLC && raw_dlc <= CAN_MAX_RAW_DLC))
>> => use raw_dlc
>>

(..)

> 
> If I understand well, the idea would be not to use a setsockopt() but
> instead rely on some logic on the can_dlc and raw_dlc to determine
> which one to use.
> 
> Unfortunately, this approach has one issue in the TX path that could
> break existing applications.
> 
> Consider below code (which I think is fairly realistic):
> 
> void send_frame (int soc, canid_t can_id, char *data, size_t len)
> {
>      struct can_frame cf;
>      size_t dlc = len > sizeof(cf.data) ? sizeof(cf.data) : len;
> 
>      cf.can_id = can_id;
>      cf.can_dlc = dlc;
>      memcpy(cf.data, data, dlc);
> 
>      write(soc, &cf, sizeof(cf));
> }
> 
> Here, the user did not initialize the can frame (cf) but assigned all
> the relevant fields manually. Because cf is not initialized, the newly
> introduced cf.dlc_raw field would have any of the values which was
> present on the stack at the moment (rationale: the C standard does not
> guarantee zero initialization). If 9 <= raw_dlc <= 15, the can frame
> will be transmitted with a bad DLC value. If raw_dlc > 15, the can
> frame will be discarded.

No, this is not what I wrote. With my suggestion you need to populate 
both dlc elements to use the new "raw dlc" feature.

if (can_dlc == 8) && (9 <= raw_dlc <= 15)
=> put raw_dlc value into the controller register
else
=> put can_dlc value into the controller register

When you have a test system to make security tests and you enable 
CAN_CTRLMODE_RAW_DLC on a specific CAN interface - which applications 
would you run to send CAN frames on this interface?

I assume only the test application which is really aware of setting 
can_dlc and raw_dlc correctly.

CAN_CTRLMODE_RAW_DLC is no 'standard' option. It's a testing facility 
and only used, when people want to play with raw DLCs.

An option could be to introduce a sockopt CAN_RAW_RAW_DLC that only 
forwards the raw_dlc element for classic CAN frames when enabled, and 
clears the raw_dlc field otherwise.

But again: Who ever enables CAN_CTRLMODE_RAW_DLC has a specific use-case 
in mind and knows what he's doing.

> I think that it is mandatory to have the application declare that it
> wants to use raw DLCs (this way, we know that the code is "DLC
> aware"). I can not think of any transparent approach.
> 
> Next, we might think of using the netlink CAN_CTRLMODE_RAW_DLC and
> the CAN_RAW_RAW_DLC socket option and the raw_dlc field. But I think
> that this is overkill.
> 
> If not introducing new dlc_raw field, the drawback, as you pointed
> out, will be that we would lose the backward compatibility of
> canfd_frame with can_frame and that can_dlc field can not be used
> directly as a length.
> 
> For userland, I think that this is acceptable because the very instant
> the user calls setsockopt() with the CAN_RAW_RAW_DLC, he should be
> aware of the consequences and should resign to use can_dlc field as a
> plain length.

Not filling can_dlc would cause tons of changes for sanity checks and 
feature switches.

Therefore everything should remain as-is and ONLY in the case of 
CAN_CTRLMODE_RAW_DLC and can_dlc == 8 the CAN driver validates the 
raw_dlc value element and uses it for transmission.

> That of course means that this should be clearly
> highlighted when updating the documentation. And users not interested
> by this feature can continue to use it as they did before.
> 
> Inside the kernel, all drivers advertising CAN_CTRLMODE_RAW_DLC will
> also resign the right to use can_dlc as plain length.

As explained before. This would cause effort without any need for the 
can_dlc handling.

> Drivers not
> using it are safe with their existing code. Finally, the TX and RX
> path would both need to be inspected in detail.

Yep.

> TLDR; socket options seem mandatory in all cases. We then have to
> choose between breaking the can_dlc plain length property or
> introducing a new raw_dlc field. My choice goes to the first option of
> breaking the can_dlc plain length property.

The 14 year old documentation in can.h says:
@can_dlc: frame payload length in byte (0 .. 8)

I will not break this established rule for a testing feature. The 
question from Joakim clearly showed: Don't touch this!

At the end it would have made more sense to call it can_frame.len in the 
first place. But this first came into our mind when CAN FD showed up. 
The discussion about the can_dlc meaning can be closed IMO. It is a 
plain length from 0..8 which is unfortunately named can_dlc.

Regards,
Oliver

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames
  2020-10-21 17:52                       ` Oliver Hartkopp
@ 2020-10-22  3:30                         ` Vincent MAILHOL
  2020-10-22  7:15                           ` Oliver Hartkopp
  0 siblings, 1 reply; 45+ messages in thread
From: Vincent MAILHOL @ 2020-10-22  3:30 UTC (permalink / raw)
  To: Oliver Hartkopp
  Cc: Marc Kleine-Budde, linux-can, kernel, Stéphane Grosjean

On 21.10.21 02:52, Oliver Hartkopp wrote:
> On 21.10.20 13:55, Vincent MAILHOL wrote:
>>> On 21.10.20 18:48, Oliver Hartkopp wrote:
>
>>> To be more compatible to non raw dlc CAN frame sources the tx handling
>>> could also be like this:
>>>
>>> if ((can_dlc == CAN_MAX_DLEN) &&
>>>       (raw_dlc >= CAN_MAX_DLC && raw_dlc <= CAN_MAX_RAW_DLC))
>>> => use raw_dlc
>>>

In addition to that, we would have to decide what to do with malformed
frames. If can_dlc and can_raw are set to contradictory values, do we
drop the frame, do we just ignore the raw_dlc? For example, if can_dlc
is 8 but raw_dlc is 2, I think that we should drop.

If we go this direction, I would propose below logic:

 1. if raw_dlc equals can_dlc: nominal case.

 2. if can_dlc equals 8 and raw_dlc is greater than 8: use raw_dlc
    case, already covered in your code.

 3. if raw_dlc is 0, then consider it as unset: overwrite raw_dlc with
    can_dlc (so that it becomes case 1).

 4. Other scenarios: the frame is malformed: drop it.

Logic should apply for both Tx and Rx paths.

Then, the drivers/users would only have to manage scenarios 1 and 2
(and 2 can be ignored if CAN_CTRLMODE_RAW_DLC is not advertised).

>>
>> If I understand well, the idea would be not to use a setsockopt() but
>> instead rely on some logic on the can_dlc and raw_dlc to determine
>> which one to use.
>>
>> Unfortunately, this approach has one issue in the TX path that could
>> break existing applications.
>>
>> Consider below code (which I think is fairly realistic):
>>
>> void send_frame (int soc, canid_t can_id, char *data, size_t len)
>> {
>>      struct can_frame cf;
>>      size_t dlc = len > sizeof(cf.data) ? sizeof(cf.data) : len;
>>
>>      cf.can_id = can_id;
>>      cf.can_dlc = dlc;
>>      memcpy(cf.data, data, dlc);
>>
>>      write(soc, &cf, sizeof(cf));
>> }
>>
>> Here, the user did not initialize the can frame (cf) but assigned all
>> the relevant fields manually. Because cf is not initialized, the newly
>> introduced cf.dlc_raw field would have any of the values which was
>> present on the stack at the moment (rationale: the C standard does not
>> guarantee zero initialization). If 9 <= raw_dlc <= 15, the can frame
>> will be transmitted with a bad DLC value. If raw_dlc > 15, the can
>> frame will be discarded.
>
> No, this is not what I wrote. With my suggestion you need to populate
> both dlc elements to use the new "raw dlc" feature.
>
> if (can_dlc == 8) && (9 <= raw_dlc <= 15)
> => put raw_dlc value into the controller register
> else
> => put can_dlc value into the controller register
>
> When you have a test system to make security tests and you enable
> CAN_CTRLMODE_RAW_DLC on a specific CAN interface - which applications
> would you run to send CAN frames on this interface?
>
> I assume only the test application which is really aware of setting
> can_dlc and raw_dlc correctly.

My point is that this assumption is dangerous.

I do not think I made myself clear. I am speaking about old "non-DLC
aware" code running in CAN_CTRLMODE_RAW_DLC context.

Consider two applications: the first application is a test application
which is "DLC aware", the second is a legacy application which
contains the code which I presented in my previous message.

A user who wants to run both in parallel sets CAN_CTRLMODE_RAW_DLC at
netlink level. First application works fine, second application which
contains the legacy code gets impacted as explained previously and
stops behaving as intended.

Newly introduced option should not break existing code regardless
why. In opposition, we can introduce new rules if these are strictly
tied to the new option.

In my mind, imposing a rule that old code should not be used in
CAN_CTRLMODE_RAW_DLC context would create a huge pitfall and would
violate the "don't break userland" principle.

> CAN_CTRLMODE_RAW_DLC is no 'standard' option. It's a testing facility
> and only used, when people want to play with raw DLCs.
>
> An option could be to introduce a sockopt CAN_RAW_RAW_DLC that only
> forwards the raw_dlc element for classic CAN frames when enabled, and
> clears the raw_dlc field otherwise.
>
> But again: Who ever enables CAN_CTRLMODE_RAW_DLC has a specific use-case
> in mind and knows what he's doing.

Yes for the new code, no for the legacy code. This is why I think it
has to be managed at the application level, not only at netlink level.

>> I think that it is mandatory to have the application declare that it
>> wants to use raw DLCs (this way, we know that the code is "DLC
>> aware"). I can not think of any transparent approach.
>>
>> Next, we might think of using the netlink CAN_CTRLMODE_RAW_DLC and
>> the CAN_RAW_RAW_DLC socket option and the raw_dlc field. But I think
>> that this is overkill.
>>
>> If not introducing new dlc_raw field, the drawback, as you pointed
>> out, will be that we would lose the backward compatibility of
>> canfd_frame with can_frame and that can_dlc field can not be used
>> directly as a length.
>>
>> For userland, I think that this is acceptable because the very instant
>> the user calls setsockopt() with the CAN_RAW_RAW_DLC, he should be
>> aware of the consequences and should resign to use can_dlc field as a
>> plain length.
>
> Not filling can_dlc would cause tons of changes for sanity checks and
> feature switches.

I never spoke about "not filling can_dlc". My point is to not use it
as a length but use it as a DLC according to ISO definition. In my
approach, can_dlc should always be filled with the raw DLC value.

> Therefore everything should remain as-is and ONLY in the case of
> CAN_CTRLMODE_RAW_DLC and can_dlc == 8 the CAN driver validates the
> raw_dlc value element and uses it for transmission.
>
>> That of course means that this should be clearly
>> highlighted when updating the documentation. And users not interested
>> by this feature can continue to use it as they did before.
>>
>> Inside the kernel, all drivers advertising CAN_CTRLMODE_RAW_DLC will
>> also resign the right to use can_dlc as plain length.
>
> As explained before. This would cause effort without any need for the
> can_dlc handling.

With some relevant helper functions (can_get_raw_dlc(),
can_get_len()), I think that the effort for can_dlc handling would be
comparable to the effort for raw_dlc handling.

>> Drivers not
>> using it are safe with their existing code. Finally, the TX and RX
>> path would both need to be inspected in detail.
>
> Yep.
>
>> TLDR; socket options seem mandatory in all cases. We then have to
>> choose between breaking the can_dlc plain length property or
>> introducing a new raw_dlc field. My choice goes to the first option of
>> breaking the can_dlc plain length property.
>
> The 14 year old documentation in can.h says:
> @can_dlc: frame payload length in byte (0 .. 8)
>
> I will not break this established rule for a testing feature. The
> question from Joakim clearly showed: Don't touch this!

My point is this is an expert feature: if you do not understand it, do
not use it and you are safe to go using the 14 years old definition.

> At the end it would have made more sense to call it can_frame.len in the
> first place. But this first came into our mind when CAN FD showed up.
> The discussion about the can_dlc meaning can be closed IMO. It is a
> plain length from 0..8 which is unfortunately named can_dlc.

I agree that it should have been named len from the beginning, but as
a matter of fact it has been named "can_dlc". For me as a user, I
expect can_dlc to follow the DLC definition in ISO and so I *prefer*
to opt for the compromise of losing the plain length property rather
than losing the semantic meaning.

I would like to conclude by saying that I do necessarily think that
your approach of raw_dlc field is bad (if used in combination with the
setsockopt()). It has its pros and cons. However, if you ask me for
feedback, *my answer* is that *I prefer* to use can_dlc as a DLC. But
at the end of the day, I would be happy if the feature gets
implemented, regardless how, so that I can do my testing :-)


Yours sincerely,
Vincent Mailhol

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames
  2020-10-22  3:30                         ` Vincent MAILHOL
@ 2020-10-22  7:15                           ` Oliver Hartkopp
  2020-10-22 12:23                             ` Vincent MAILHOL
  0 siblings, 1 reply; 45+ messages in thread
From: Oliver Hartkopp @ 2020-10-22  7:15 UTC (permalink / raw)
  To: Vincent MAILHOL
  Cc: Marc Kleine-Budde, linux-can, kernel, Stéphane Grosjean



On 22.10.20 05:30, Vincent MAILHOL wrote:
> On 21.10.21 02:52, Oliver Hartkopp wrote:
>> On 21.10.20 13:55, Vincent MAILHOL wrote:
>>>> On 21.10.20 18:48, Oliver Hartkopp wrote:
>>
>>>> To be more compatible to non raw dlc CAN frame sources the tx handling
>>>> could also be like this:
>>>>
>>>> if ((can_dlc == CAN_MAX_DLEN) &&
>>>>        (raw_dlc >= CAN_MAX_DLC && raw_dlc <= CAN_MAX_RAW_DLC))
>>>> => use raw_dlc
>>>>
> 
> In addition to that, we would have to decide what to do with malformed
> frames. If can_dlc and can_raw are set to contradictory values, do we
> drop the frame, do we just ignore the raw_dlc? For example, if can_dlc
> is 8 but raw_dlc is 2, I think that we should drop.

The pseudo code is pretty clear on this: Use can_dlc.

The nice thing about testing raw_dlc to be 9..15 when can_dlc == 8 is, 
that we always have a correct CAN frame on the wire with 8 bytes of content.

Ok, the raw_dlc might be an uninitialized value - but still everything 
remains fine for operation. So why dropping frames here?

The problem of uninitialized values could be solved with a 
CAN_RAW_RAW_DLC sockopt BUT you will be always able to send a CAN frame 
with an AF_PACKET socket which has no CAN_RAW_RAW_DLC sockopt.

And then?

The above pseudo code together with CAN_CTRLMODE_RAW_DLC seems to be a 
pretty safe option to me. Even if 'legacy' applications with 
uninitialized raw_dlc send CAN frames or AF_PACKET users enable 
CAN_CTRLMODE_RAW_DLC we always end up with a proper can_dlc == 8  and a 
fitting valid raw_dlc.

> 
> If we go this direction, I would propose below logic:
> 
>   1. if raw_dlc equals can_dlc: nominal case.

Is not tested. We would start on testing can_dlc == 8

> 
>   2. if can_dlc equals 8 and raw_dlc is greater than 8: use raw_dlc
>      case, already covered in your code.

ACK

>   3. if raw_dlc is 0, then consider it as unset: overwrite raw_dlc with
>      can_dlc (so that it becomes case 1).

Is not tested like this. We would start on can_dlc == 8 - and then check 
for valid raw_dlc 9..15

> 
>   4. Other scenarios: the frame is malformed: drop it.

No dropping.

> Logic should apply for both Tx and Rx paths.

So test for can_dlc == 8 - and then check for valid raw_dlc 9..15 in the 
user space?! Fine.

Btw. we can make sure that we do not have uninitialized raw_dlc value in 
the rx path in the enabled drivers. In fact you could always use the 
raw_dlc then.

BUT it makes sense to use this test to detect cases, where someone 
forgot to switch on CAN_CTRLMODE_RAW_DLC or the driver doesn't support 
it and the programmer did not check the ctrlmode netlink return values.

> Then, the drivers/users would only have to manage scenarios 1 and 2
> (and 2 can be ignored if CAN_CTRLMODE_RAW_DLC is not advertised).
> 
>>>
>>> If I understand well, the idea would be not to use a setsockopt() but
>>> instead rely on some logic on the can_dlc and raw_dlc to determine
>>> which one to use.
>>>
>>> Unfortunately, this approach has one issue in the TX path that could
>>> break existing applications.
>>>
>>> Consider below code (which I think is fairly realistic):
>>>
>>> void send_frame (int soc, canid_t can_id, char *data, size_t len)
>>> {
>>>       struct can_frame cf;
>>>       size_t dlc = len > sizeof(cf.data) ? sizeof(cf.data) : len;
>>>
>>>       cf.can_id = can_id;
>>>       cf.can_dlc = dlc;
>>>       memcpy(cf.data, data, dlc);
>>>
>>>       write(soc, &cf, sizeof(cf));
>>> }
>>>
>>> Here, the user did not initialize the can frame (cf) but assigned all
>>> the relevant fields manually. Because cf is not initialized, the newly
>>> introduced cf.dlc_raw field would have any of the values which was
>>> present on the stack at the moment (rationale: the C standard does not
>>> guarantee zero initialization). If 9 <= raw_dlc <= 15, the can frame
>>> will be transmitted with a bad DLC value. If raw_dlc > 15, the can
>>> frame will be discarded.
>>
>> No, this is not what I wrote. With my suggestion you need to populate
>> both dlc elements to use the new "raw dlc" feature.
>>
>> if (can_dlc == 8) && (9 <= raw_dlc <= 15)
>> => put raw_dlc value into the controller register
>> else
>> => put can_dlc value into the controller register
>>
>> When you have a test system to make security tests and you enable
>> CAN_CTRLMODE_RAW_DLC on a specific CAN interface - which applications
>> would you run to send CAN frames on this interface?
>>
>> I assume only the test application which is really aware of setting
>> can_dlc and raw_dlc correctly.
> 
> My point is that this assumption is dangerous.
> 
> I do not think I made myself clear. I am speaking about old "non-DLC
> aware" code running in CAN_CTRLMODE_RAW_DLC context.
> 
> Consider two applications: the first application is a test application
> which is "DLC aware", the second is a legacy application which
> contains the code which I presented in my previous message.
> 
> A user who wants to run both in parallel sets CAN_CTRLMODE_RAW_DLC at
> netlink level. First application works fine, second application which
> contains the legacy code gets impacted as explained previously and
> stops behaving as intended.
> 
> Newly introduced option should not break existing code regardless
> why. In opposition, we can introduce new rules if these are strictly
> tied to the new option.

Yes but it doesn't help you with AF_PACKET.

> In my mind, imposing a rule that old code should not be used in
> CAN_CTRLMODE_RAW_DLC context would create a huge pitfall and would
> violate the "don't break userland" principle.

CAN_RAW_RAW_DLC could help in the case of legacy code only when they are 
using CAN_RAW sockets. But nothing more.

Therefore we need the robustness on driver level with the checking of 
can_dlc == 8.

(..)

>>> For userland, I think that this is acceptable because the very instant
>>> the user calls setsockopt() with the CAN_RAW_RAW_DLC, he should be
>>> aware of the consequences and should resign to use can_dlc field as a
>>> plain length.
>>
>> Not filling can_dlc would cause tons of changes for sanity checks and
>> feature switches.
> 
> I never spoke about "not filling can_dlc". My point is to not use it
> as a length but use it as a DLC according to ISO definition. In my
> approach, can_dlc should always be filled with the raw DLC value.

Ok, let me rephrase: Filling can_dlc with something else than a plain 
length information 0..8 ;-)

(..)

>> The 14 year old documentation in can.h says:
>> @can_dlc: frame payload length in byte (0 .. 8)
>>
>> I will not break this established rule for a testing feature. The
>> question from Joakim clearly showed: Don't touch this!
> 
> My point is this is an expert feature: if you do not understand it, do
> not use it and you are safe to go using the 14 years old definition.

Full ACK - but you can't imagine what people do in the real world. Do 
you know these 'more helps more' guys that switch everything to 'on'? :-D

The approach with CAN_CTRLMODE_RAW_DLC and the described testing would 
be even robust against unintended miss-use.

>> At the end it would have made more sense to call it can_frame.len in the
>> first place. But this first came into our mind when CAN FD showed up.
>> The discussion about the can_dlc meaning can be closed IMO. It is a
>> plain length from 0..8 which is unfortunately named can_dlc.
> 
> I agree that it should have been named len from the beginning, but as
> a matter of fact it has been named "can_dlc". For me as a user, I
> expect can_dlc to follow the DLC definition in ISO and so I *prefer*
> to opt for the compromise of losing the plain length property rather
> than losing the semantic meaning.

I don't think you can claim to be a standard user. You are an expert for 
security testing and digged into the ISO standard and intentionally 
write DLC values >8 into controller registers. Users don't do things 
like this ;-)

Clean-up and renaming typically ends when it breaks APIs and commonly 
followed rules. There are many examples in Linux where these clean-up 
attempts have been canceled for these reasons.

I would have liked to rename can_dlc to len too.
But now that we have a compatible canfd_frame which can contain a 
can_frame this is somehow settled: can_dlc == len.

> I would like to conclude by saying that I do necessarily think that
> your approach of raw_dlc field is bad (if used in combination with the
> setsockopt()). It has its pros and cons. However, if you ask me for
> feedback, *my answer* is that *I prefer* to use can_dlc as a DLC. But
> at the end of the day, I would be happy if the feature gets
> implemented, regardless how, so that I can do my testing :-)

So let's see how we can get there best :-)

Best regards,
Oliver

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames
  2020-10-22  7:15                           ` Oliver Hartkopp
@ 2020-10-22 12:23                             ` Vincent MAILHOL
  2020-10-22 13:28                               ` Oliver Hartkopp
  0 siblings, 1 reply; 45+ messages in thread
From: Vincent MAILHOL @ 2020-10-22 12:23 UTC (permalink / raw)
  To: Oliver Hartkopp
  Cc: Marc Kleine-Budde, linux-can, kernel, Stéphane Grosjean,
	Vincent Mailhol

On 22.10.20 16:15, Oliver Hartkopp wrote:
> On 22.10.20 05:30, Vincent MAILHOL wrote:
>> On 22.10.20 02:52, Oliver Hartkopp wrote:
>>> On 21.10.20 13:55, Vincent MAILHOL wrote:
>>>>> On 21.10.20 18:48, Oliver Hartkopp wrote:
>>>
>>>>> To be more compatible to non raw dlc CAN frame sources the tx handling
>>>>> could also be like this:
>>>>>
>>>>> if ((can_dlc == CAN_MAX_DLEN) &&
>>>>>        (raw_dlc >= CAN_MAX_DLC && raw_dlc <= CAN_MAX_RAW_DLC))
>>>>> => use raw_dlc
>>>>>
>>
>> In addition to that, we would have to decide what to do with malformed
>> frames. If can_dlc and can_raw are set to contradictory values, do we
>> drop the frame, do we just ignore the raw_dlc? For example, if can_dlc
>> is 8 but raw_dlc is 2, I think that we should drop.
>
> The pseudo code is pretty clear on this: Use can_dlc.
>
> The nice thing about testing raw_dlc to be 9..15 when can_dlc == 8 is,
> that we always have a correct CAN frame on the wire with 8 bytes of content.
>
> Ok, the raw_dlc might be an uninitialized value - but still everything
> remains fine for operation. So why dropping frames here?

If the can_dlc and raw_dlc are in contradiction, then the user is
trying to do something not supported. After introducing raw_dlc,
someone might want to try sending such malformed frames. If the
operation succeeds, he then might be fooled in thinking that he
actually send, for example, a frame of eight bytes with a DLC of 2.

In such a scenario, I think it is better to tell the user: "what you
are doing is not supported" by returning -EINVAL (similar to what is
currently done when an invalid skb is sent).

> The problem of uninitialized values could be solved with a
> CAN_RAW_RAW_DLC sockopt BUT you will be always able to send a CAN frame
> with an AF_PACKET socket which has no CAN_RAW_RAW_DLC sockopt.
>
> And then?

I am not competent enough on that point to comment. I was not aware of
the possibility of using AF_PACKET to send CAN frames. Do you have a
link to any example of such code? I would be glad to study it and come
back to you once I understand this point.

> The above pseudo code together with CAN_CTRLMODE_RAW_DLC seems to be a
> pretty safe option to me. Even if 'legacy' applications with
> uninitialized raw_dlc send CAN frames or AF_PACKET users enable
> CAN_CTRLMODE_RAW_DLC we always end up with a proper can_dlc == 8  and a
> fitting valid raw_dlc.

But what if that "fitting valid raw_dlc" is greater than 8? The kernel
would see that frame as valid, a frame with a DLC greater than 8 will
be sent on the wire and received by the other devices than might be
connected on the CAN bus.

>>
>> If we go this direction, I would propose below logic:
>>
>>   1. if raw_dlc equals can_dlc: nominal case.
>
> Is not tested. We would start on testing can_dlc == 8
>
>>
>>   2. if can_dlc equals 8 and raw_dlc is greater than 8: use raw_dlc
>>      case, already covered in your code.
>
> ACK
>
>>   3. if raw_dlc is 0, then consider it as unset: overwrite raw_dlc with
>>      can_dlc (so that it becomes case 1).
>
> Is not tested like this. We would start on can_dlc == 8 - and then check
> for valid raw_dlc 9..15
>
>>
>>   4. Other scenarios: the frame is malformed: drop it.
>
> No dropping.
>
>> Logic should apply for both Tx and Rx paths.
>
> So test for can_dlc == 8 - and then check for valid raw_dlc 9..15 in the
> user space?! Fine.

My view: users and drivers directly use raw_dlc without need to check
for can_dlc == 8 (detailed explanation below).

> Btw. we can make sure that we do not have uninitialized raw_dlc value in
> the rx path in the enabled drivers. In fact you could always use the
> raw_dlc then.

My view is that if we introduce the raw_dlc field, we should always
populate it with the DLC value as defined in ISO, even if
CAN_CTRLMODE_RAW_DLC and CAN_RAW_RAW_DLC are not active. User could
then benefit from this field to get the DLC value (and maximum value
would be either CAN_MAX_DLC or CAN_MAX_DLC RAW depending on the
context).

Rationale is that the raw_dlc field would always be visible in the
struct can_frame. So if it is here, use it. Having it sometimes used,
sometimes not would be confusing for the user.

Basically, that would replace the use of the can_len2dlc() function.

I also think it should be extended to CAN-FD, but no need to elaborate
more on this at the moment: let's keep the discussion focused on
classical CAN and tackle this later.

> BUT it makes sense to use this test to detect cases, where someone
> forgot to switch on CAN_CTRLMODE_RAW_DLC or the driver doesn't support
> it and the programmer did not check the ctrlmode netlink return values.

My view: if the user tries to send raw_dlc bigger than 8 and
CAN_CTRLMODE_RAW_DLC and CAN_RAW_RAW_DLC are not active, return
-EINVAL.

>> Then, the drivers/users would only have to manage scenarios 1 and 2
>> (and 2 can be ignored if CAN_CTRLMODE_RAW_DLC is not advertised).
>>
>>>>
>>>> If I understand well, the idea would be not to use a setsockopt() but
>>>> instead rely on some logic on the can_dlc and raw_dlc to determine
>>>> which one to use.
>>>>
>>>> Unfortunately, this approach has one issue in the TX path that could
>>>> break existing applications.
>>>>
>>>> Consider below code (which I think is fairly realistic):
>>>>
>>>> void send_frame (int soc, canid_t can_id, char *data, size_t len)
>>>> {
>>>>       struct can_frame cf;
>>>>       size_t dlc = len > sizeof(cf.data) ? sizeof(cf.data) : len;
>>>>
>>>>       cf.can_id = can_id;
>>>>       cf.can_dlc = dlc;
>>>>       memcpy(cf.data, data, dlc);
>>>>
>>>>       write(soc, &cf, sizeof(cf));
>>>> }
>>>>
>>>> Here, the user did not initialize the can frame (cf) but assigned all
>>>> the relevant fields manually. Because cf is not initialized, the newly
>>>> introduced cf.dlc_raw field would have any of the values which was
>>>> present on the stack at the moment (rationale: the C standard does not
>>>> guarantee zero initialization). If 9 <= raw_dlc <= 15, the can frame
>>>> will be transmitted with a bad DLC value. If raw_dlc > 15, the can
>>>> frame will be discarded.
>>>
>>> No, this is not what I wrote. With my suggestion you need to populate
>>> both dlc elements to use the new "raw dlc" feature.
>>>
>>> if (can_dlc == 8) && (9 <= raw_dlc <= 15)
>>> => put raw_dlc value into the controller register
>>> else
>>> => put can_dlc value into the controller register
>>>
>>> When you have a test system to make security tests and you enable
>>> CAN_CTRLMODE_RAW_DLC on a specific CAN interface - which applications
>>> would you run to send CAN frames on this interface?
>>>
>>> I assume only the test application which is really aware of setting
>>> can_dlc and raw_dlc correctly.
>>
>> My point is that this assumption is dangerous.
>>
>> I do not think I made myself clear. I am speaking about old "non-DLC
>> aware" code running in CAN_CTRLMODE_RAW_DLC context.
>>
>> Consider two applications: the first application is a test application
>> which is "DLC aware", the second is a legacy application which
>> contains the code which I presented in my previous message.
>>
>> A user who wants to run both in parallel sets CAN_CTRLMODE_RAW_DLC at
>> netlink level. First application works fine, second application which
>> contains the legacy code gets impacted as explained previously and
>> stops behaving as intended.
>>
>> Newly introduced option should not break existing code regardless
>> why. In opposition, we can introduce new rules if these are strictly
>> tied to the new option.
>
> Yes but it doesn't help you with AF_PACKET.
>
>> In my mind, imposing a rule that old code should not be used in
>> CAN_CTRLMODE_RAW_DLC context would create a huge pitfall and would
>> violate the "don't break userland" principle.
>
> CAN_RAW_RAW_DLC could help in the case of legacy code only when they are
> using CAN_RAW sockets. But nothing more.
>
> Therefore we need the robustness on driver level with the checking of
> can_dlc == 8.

Will comment when I have a better understanding of AF_PACKET.

>>>> For userland, I think that this is acceptable because the very instant
>>>> the user calls setsockopt() with the CAN_RAW_RAW_DLC, he should be
>>>> aware of the consequences and should resign to use can_dlc field as a
>>>> plain length.
>>>
>>> Not filling can_dlc would cause tons of changes for sanity checks and
>>> feature switches.
>>
>> I never spoke about "not filling can_dlc". My point is to not use it
>> as a length but use it as a DLC according to ISO definition. In my
>> approach, can_dlc should always be filled with the raw DLC value.
>
> Ok, let me rephrase: Filling can_dlc with something else than a plain
> length information 0..8 ;-)

Got it :-)

From my preliminary study, not so much changes or sanity check would
need to be added:

  1. The current can_dlc sanity checks in can_send() and can_rcv() in
     net/can/af_can.c definitely needs to be adjusted (but this would
     also be the case for the raw_dlc field solution).

  2. In the kernel Rx and Tx paths: the length should not be accessed
     directly anymore but through a getter function.

  3. In the kernel: drivers "DLC aware" need to adapt their code and
     use the length getter function.

  4. In user land: new "DLC aware" code should always use struct
     can_frame for classical CAN and check whether can_dlc is greater
     that CAN_MAX_DLEN before accessing data.

That's mostly all that would have to be adjusted.

>>> The 14 year old documentation in can.h says:
>>> @can_dlc: frame payload length in byte (0 .. 8)
>>>
>>> I will not break this established rule for a testing feature. The
>>> question from Joakim clearly showed: Don't touch this!
>>
>> My point is this is an expert feature: if you do not understand it, do
>> not use it and you are safe to go using the 14 years old definition.
>
> Full ACK - but you can't imagine what people do in the real world. Do
> you know these 'more helps more' guys that switch everything to 'on'? :-D

Switching everything options on through an "ip link set canX up ..."
should not cause damage. If the user is doing random setsockopt()
without any clues of what he is doing, then he is already a lost cause
for me...

> The approach with CAN_CTRLMODE_RAW_DLC and the described testing would
> be even robust against unintended miss-use.

Exception: can_dlc == 8 and a non-initialed raw_dlc gets a garbage
value between 9 and 15 from the stack (c.f. above comment).

>>> At the end it would have made more sense to call it can_frame.len in the
>>> first place. But this first came into our mind when CAN FD showed up.
>>> The discussion about the can_dlc meaning can be closed IMO. It is a
>>> plain length from 0..8 which is unfortunately named can_dlc.
>>
>> I agree that it should have been named len from the beginning, but as
>> a matter of fact it has been named "can_dlc". For me as a user, I
>> expect can_dlc to follow the DLC definition in ISO and so I *prefer*
>> to opt for the compromise of losing the plain length property rather
>> than losing the semantic meaning.
>
> I don't think you can claim to be a standard user. You are an expert for
> security testing and digged into the ISO standard and intentionally
> write DLC values >8 into controller registers. Users don't do things
> like this ;-)

You get a point :-)

Whatever the output of our discussion is, I have one strong
request. The linux kernel has a huge audience and has such, I think it
is our duty to be technically accurate even in small details.

I would like to add a comment in the struct can_frame to acknowledge
the discrepancy with the ISO. Some people who do not have access to
the ISO standard might refer instead to the SocketCAN documentation to
learn and understand the protocol and those people might use this
knowledge to program on other platforms (e.g. bare-metal
microcontrollers). Using the DLC as a plain length on those systems
might results in vulnerabilities in production and potentially safety
hazard.

That's also why I would like to see the Socket CAN DLC be used
according to ISO: so that we raise awareness on that issue and so that
users can transfer their knowledge to other platforms.

Updated comments would be something as below (rough draft):

@can_dlc: frame payload length in byte (0 .. 8). If the frame is a
      remote transmission request (RTR) @can_dlc represents the
      requested length and the actual payload length is 0
      bytes. ISO 11898-1 section 8.4.2.4 "DLC field" do allows DLC
      values from 0 to 15, however <TBD: will depend on the final
      output of this discussion>.

> Clean-up and renaming typically ends when it breaks APIs and commonly
> followed rules. There are many examples in Linux where these clean-up
> attempts have been canceled for these reasons.
>
> I would have liked to rename can_dlc to len too.
> But now that we have a compatible canfd_frame which can contain a
> can_frame this is somehow settled: can_dlc == len.
>
>> I would like to conclude by saying that I do necessarily think that
>> your approach of raw_dlc field is bad (if used in combination with the
>> setsockopt()). It has its pros and cons. However, if you ask me for
>> feedback, *my answer* is that *I prefer* to use can_dlc as a DLC. But
>> at the end of the day, I would be happy if the feature gets
>> implemented, regardless how, so that I can do my testing :-)
>
> So let's see how we can get there best :-)

Absolutely!


Yours sincerely,
Vincent Mailhol

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames
  2020-10-22 12:23                             ` Vincent MAILHOL
@ 2020-10-22 13:28                               ` Oliver Hartkopp
  2020-10-22 15:46                                 ` Vincent MAILHOL
  0 siblings, 1 reply; 45+ messages in thread
From: Oliver Hartkopp @ 2020-10-22 13:28 UTC (permalink / raw)
  To: Vincent MAILHOL
  Cc: Marc Kleine-Budde, linux-can, kernel, Stéphane Grosjean



On 22.10.20 14:23, Vincent MAILHOL wrote:
> On 22.10.20 16:15, Oliver Hartkopp wrote:
>> On 22.10.20 05:30, Vincent MAILHOL wrote:
>>> On 22.10.20 02:52, Oliver Hartkopp wrote:
>>>> On 21.10.20 13:55, Vincent MAILHOL wrote:
>>>>>> On 21.10.20 18:48, Oliver Hartkopp wrote:
>>>>
>>>>>> To be more compatible to non raw dlc CAN frame sources the tx handling
>>>>>> could also be like this:
>>>>>>
>>>>>> if ((can_dlc == CAN_MAX_DLEN) &&
>>>>>>         (raw_dlc >= CAN_MAX_DLC && raw_dlc <= CAN_MAX_RAW_DLC))
>>>>>> => use raw_dlc
>>>>>>
>>>
>>> In addition to that, we would have to decide what to do with malformed
>>> frames. If can_dlc and can_raw are set to contradictory values, do we
>>> drop the frame, do we just ignore the raw_dlc? For example, if can_dlc
>>> is 8 but raw_dlc is 2, I think that we should drop.
>>
>> The pseudo code is pretty clear on this: Use can_dlc.
>>
>> The nice thing about testing raw_dlc to be 9..15 when can_dlc == 8 is,
>> that we always have a correct CAN frame on the wire with 8 bytes of content.
>>
>> Ok, the raw_dlc might be an uninitialized value - but still everything
>> remains fine for operation. So why dropping frames here?
> 
> If the can_dlc and raw_dlc are in contradiction, then the user is
> trying to do something not supported. After introducing raw_dlc,
> someone might want to try sending such malformed frames. If the
> operation succeeds, he then might be fooled in thinking that he
> actually send, for example, a frame of eight bytes with a DLC of 2.

??

You obviously did not read my description / pseudo code :-(

There is no contradiction.

if (can_dlc == 8) && (raw_dlc > 8 && raw_dlc <= 15)
     dlc = raw_dlc;
else
     dlc = can_dlc;

can_dlc == 8 is the entry door to write dlc values from 9..15 into the 
controller register.
Which always leads to a valid CAN frame with 8 bytes.

> In such a scenario, I think it is better to tell the user: "what you
> are doing is not supported" by returning -EINVAL (similar to what is
> currently done when an invalid skb is sent).
> 
>> The problem of uninitialized values could be solved with a
>> CAN_RAW_RAW_DLC sockopt BUT you will be always able to send a CAN frame
>> with an AF_PACKET socket which has no CAN_RAW_RAW_DLC sockopt.
>>
>> And then?
> 
> I am not competent enough on that point to comment. I was not aware of
> the possibility of using AF_PACKET to send CAN frames. Do you have a
> link to any example of such code? I would be glad to study it and come
> back to you once I understand this point.

https://github.com/linux-can/can-tests/blob/master/netlayer/tst-packet.c

>> The above pseudo code together with CAN_CTRLMODE_RAW_DLC seems to be a
>> pretty safe option to me. Even if 'legacy' applications with
>> uninitialized raw_dlc send CAN frames or AF_PACKET users enable
>> CAN_CTRLMODE_RAW_DLC we always end up with a proper can_dlc == 8  and a
>> fitting valid raw_dlc.
> 
> But what if that "fitting valid raw_dlc" is greater than 8?

This is the plan!

> The kernel

the CAN driver

> would see that frame as valid, a frame with a DLC greater than 8 will
> be sent on the wire and received by the other devices than might be
> connected on the CAN bus.

Exactly.

> 
>>>
>>> If we go this direction, I would propose below logic:
>>>
>>>    1. if raw_dlc equals can_dlc: nominal case.
>>
>> Is not tested. We would start on testing can_dlc == 8
>>
>>>
>>>    2. if can_dlc equals 8 and raw_dlc is greater than 8: use raw_dlc
>>>       case, already covered in your code.
>>
>> ACK
>>
>>>    3. if raw_dlc is 0, then consider it as unset: overwrite raw_dlc with
>>>       can_dlc (so that it becomes case 1).
>>
>> Is not tested like this. We would start on can_dlc == 8 - and then check
>> for valid raw_dlc 9..15
>>
>>>
>>>    4. Other scenarios: the frame is malformed: drop it.
>>
>> No dropping.
>>
>>> Logic should apply for both Tx and Rx paths.
>>
>> So test for can_dlc == 8 - and then check for valid raw_dlc 9..15 in the
>> user space?! Fine.
> 
> My view: users and drivers directly use raw_dlc without need to check
> for can_dlc == 8 (detailed explanation below).
> 
>> Btw. we can make sure that we do not have uninitialized raw_dlc value in
>> the rx path in the enabled drivers. In fact you could always use the
>> raw_dlc then.
> 
> My view is that if we introduce the raw_dlc field, we should always
> populate it with the DLC value as defined in ISO, even if
> CAN_CTRLMODE_RAW_DLC and CAN_RAW_RAW_DLC are not active. User could
> then benefit from this field to get the DLC value (and maximum value
> would be either CAN_MAX_DLC or CAN_MAX_DLC RAW depending on the
> context).
> 
> Rationale is that the raw_dlc field would always be visible in the
> struct can_frame. So if it is here, use it. Having it sometimes used,
> sometimes not would be confusing for the user.

I wonder if it's more confusing to fill raw_dlc without 
CAN_CTRLMODE_RAW_DLC enabled?

> Basically, that would replace the use of the can_len2dlc() function.
> 
> I also think it should be extended to CAN-FD, but no need to elaborate
> more on this at the moment: let's keep the discussion focused on
> classical CAN and tackle this later.

Why do you want this?

You get of the bike and push your bicycle for what reason?

No.

>> BUT it makes sense to use this test to detect cases, where someone
>> forgot to switch on CAN_CTRLMODE_RAW_DLC or the driver doesn't support
>> it and the programmer did not check the ctrlmode netlink return values.
> 
> My view: if the user tries to send raw_dlc bigger than 8 and
> CAN_CTRLMODE_RAW_DLC and CAN_RAW_RAW_DLC are not active, return
> -EINVAL.

raw_dlc has always to be bigger than 8. What am I explaining wrong all 
time that you don't get it?

(..)

>> Ok, let me rephrase: Filling can_dlc with something else than a plain
>> length information 0..8 ;-)
> 
> Got it :-)
> 
>  From my preliminary study, not so much changes or sanity check would
> need to be added:
> 
>    1. The current can_dlc sanity checks in can_send() and can_rcv() in
>       net/can/af_can.c definitely needs to be adjusted (but this would
>       also be the case for the raw_dlc field solution).
> 
>    2. In the kernel Rx and Tx paths: the length should not be accessed
>       directly anymore but through a getter function.
> 
>    3. In the kernel: drivers "DLC aware" need to adapt their code and
>       use the length getter function.
> 
>    4. In user land: new "DLC aware" code should always use struct
>       can_frame for classical CAN and check whether can_dlc is greater
>       that CAN_MAX_DLEN before accessing data.
> 
> That's mostly all that would have to be adjusted.
> 

No, it is not.
You have to go through all protocols, e.g. raw.c, bcm.c, gw.c, j1939, 
isotp.c which get a skb to be (sometimes) cloned by them. And when they 
need to modify the can_dlc value to send it to the userspace or 
whatever, you need to modify skb->data - and then you can't clone it 
anymore but need to skbcopy() them.
HERE you will get a really big effort. Occasionally I was also thinking 
using can_dlc would be nice but then I looked into above code and your 
use-case does not justify that effort/ risk and performance impact.

>>>> The 14 year old documentation in can.h says:
>>>> @can_dlc: frame payload length in byte (0 .. 8)
>>>>
>>>> I will not break this established rule for a testing feature. The
>>>> question from Joakim clearly showed: Don't touch this!
>>>
>>> My point is this is an expert feature: if you do not understand it, do
>>> not use it and you are safe to go using the 14 years old definition.
>>
>> Full ACK - but you can't imagine what people do in the real world. Do
>> you know these 'more helps more' guys that switch everything to 'on'? :-D
> 
> Switching everything options on through an "ip link set canX up ..."
> should not cause damage. If the user is doing random setsockopt()
> without any clues of what he is doing, then he is already a lost cause
> for me...
> 
>> The approach with CAN_CTRLMODE_RAW_DLC and the described testing would
>> be even robust against unintended miss-use.
> 
> Exception: can_dlc == 8 and a non-initialed raw_dlc gets a garbage
> value between 9 and 15 from the stack (c.f. above comment).

Right, and what would be the effect of this?

You know the answer: Nothing breaks.

> 
>>>> At the end it would have made more sense to call it can_frame.len in the
>>>> first place. But this first came into our mind when CAN FD showed up.
>>>> The discussion about the can_dlc meaning can be closed IMO. It is a
>>>> plain length from 0..8 which is unfortunately named can_dlc.
>>>
>>> I agree that it should have been named len from the beginning, but as
>>> a matter of fact it has been named "can_dlc". For me as a user, I
>>> expect can_dlc to follow the DLC definition in ISO and so I *prefer*
>>> to opt for the compromise of losing the plain length property rather
>>> than losing the semantic meaning.
>>
>> I don't think you can claim to be a standard user. You are an expert for
>> security testing and digged into the ISO standard and intentionally
>> write DLC values >8 into controller registers. Users don't do things
>> like this ;-)
> 
> You get a point :-)
> 
> Whatever the output of our discussion is, I have one strong
> request. The linux kernel has a huge audience and has such, I think it
> is our duty to be technically accurate even in small details.

But we failed to do so by naming the plain data length can_dlc. We have 
to deal with that legacy.
You can not fix that anymore.

> I would like to add a comment in the struct can_frame to acknowledge
> the discrepancy with the ISO. Some people who do not have access to
> the ISO standard might refer instead to the SocketCAN documentation to
> learn and understand the protocol and those people might use this
> knowledge to program on other platforms (e.g. bare-metal
> microcontrollers). Using the DLC as a plain length on those systems
> might results in vulnerabilities in production and potentially safety
> hazard.
> 
> That's also why I would like to see the Socket CAN DLC be used
> according to ISO: so that we raise awareness on that issue and so that
> users can transfer their knowledge to other platforms.

Vincent, stop discussing about having in can_dlc anything else then a 
plain length from 0..8.
I'm really getting tired on this after the 6th or whatever round.

Read and understand my mails.

Regards,
Oliver


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames
  2020-10-22 13:28                               ` Oliver Hartkopp
@ 2020-10-22 15:46                                 ` Vincent MAILHOL
  2020-10-22 17:06                                   ` Oliver Hartkopp
  0 siblings, 1 reply; 45+ messages in thread
From: Vincent MAILHOL @ 2020-10-22 15:46 UTC (permalink / raw)
  To: Oliver Hartkopp
  Cc: Marc Kleine-Budde, linux-can, kernel, Stéphane Grosjean,
	Vincent Mailhol

On 22.10.20 22:28, Oliver Hartkopp wrote:
> On 22.10.20 14:23, Vincent MAILHOL wrote:
>> On 22.10.20 16:15, Oliver Hartkopp wrote:
>>> On 22.10.20 05:30, Vincent MAILHOL wrote:
>>>> On 22.10.20 02:52, Oliver Hartkopp wrote:
>>>>> On 21.10.20 13:55, Vincent MAILHOL wrote:
>>>>>>> On 21.10.20 18:48, Oliver Hartkopp wrote:
>>>>>
>>>>>>> To be more compatible to non raw dlc CAN frame sources the tx handling
>>>>>>> could also be like this:
>>>>>>>
>>>>>>> if ((can_dlc == CAN_MAX_DLEN) &&
>>>>>>>         (raw_dlc >= CAN_MAX_DLC && raw_dlc <= CAN_MAX_RAW_DLC))
>>>>>>> => use raw_dlc
>>>>>>>
>>>>
>>>> In addition to that, we would have to decide what to do with malformed
>>>> frames. If can_dlc and can_raw are set to contradictory values, do we
>>>> drop the frame, do we just ignore the raw_dlc? For example, if can_dlc
>>>> is 8 but raw_dlc is 2, I think that we should drop.
>>>
>>> The pseudo code is pretty clear on this: Use can_dlc.
>>>
>>> The nice thing about testing raw_dlc to be 9..15 when can_dlc == 8 is,
>>> that we always have a correct CAN frame on the wire with 8 bytes of content.
>>>
>>> Ok, the raw_dlc might be an uninitialized value - but still everything
>>> remains fine for operation. So why dropping frames here?
>>
>> If the can_dlc and raw_dlc are in contradiction, then the user is
>> trying to do something not supported. After introducing raw_dlc,
>> someone might want to try sending such malformed frames. If the
>> operation succeeds, he then might be fooled in thinking that he
>> actually send, for example, a frame of eight bytes with a DLC of 2.
>
> ??
>
> You obviously did not read my description / pseudo code :-(
>
> There is no contradiction.
>
> if (can_dlc == 8) && (raw_dlc > 8 && raw_dlc <= 15)
>      dlc = raw_dlc;
> else
>      dlc = can_dlc;
>
> can_dlc == 8 is the entry door to write dlc values from 9..15 into the
> controller register.
> Which always leads to a valid CAN frame with 8 bytes.

Your code is correct. Sorry, my comment was not about your code but
was to answer the "why dropping frames here" question. Just wanting to
discuss what we do if the raw_dlc holds an incoherent value.

>> In such a scenario, I think it is better to tell the user: "what you
>> are doing is not supported" by returning -EINVAL (similar to what is
>> currently done when an invalid skb is sent).
>>
>>> The problem of uninitialized values could be solved with a
>>> CAN_RAW_RAW_DLC sockopt BUT you will be always able to send a CAN frame
>>> with an AF_PACKET socket which has no CAN_RAW_RAW_DLC sockopt.
>>>
>>> And then?
>>
>> I am not competent enough on that point to comment. I was not aware of
>> the possibility of using AF_PACKET to send CAN frames. Do you have a
>> link to any example of such code? I would be glad to study it and come
>> back to you once I understand this point.
>
> https://github.com/linux-can/can-tests/blob/master/netlayer/tst-packet.c

Thanks, I will try to understand.

>>> The above pseudo code together with CAN_CTRLMODE_RAW_DLC seems to be a
>>> pretty safe option to me. Even if 'legacy' applications with
>>> uninitialized raw_dlc send CAN frames or AF_PACKET users enable
>>> CAN_CTRLMODE_RAW_DLC we always end up with a proper can_dlc == 8  and a
>>> fitting valid raw_dlc.
>>
>> But what if that "fitting valid raw_dlc" is greater than 8?
>
> This is the plan!
>
>> The kernel
>
> the CAN driver
>
>> would see that frame as valid, a frame with a DLC greater than 8 will
>> be sent on the wire and received by the other devices than might be
>> connected on the CAN bus.
>
> Exactly.

And this is my point. The legacy application intended to send a frame
of 8 bytes with a DLC of 8 but on the wire, there is a frame of 8
bytes with a DLC greater than 8. The other devices on the CAN bus
might ignore it (if the device has a DLC filter) or might crash if
there is a vulnerability in DLC handling.

>>
>>>>
>>>> If we go this direction, I would propose below logic:
>>>>
>>>>    1. if raw_dlc equals can_dlc: nominal case.
>>>
>>> Is not tested. We would start on testing can_dlc == 8
>>>
>>>>
>>>>    2. if can_dlc equals 8 and raw_dlc is greater than 8: use raw_dlc
>>>>       case, already covered in your code.
>>>
>>> ACK
>>>
>>>>    3. if raw_dlc is 0, then consider it as unset: overwrite raw_dlc with
>>>>       can_dlc (so that it becomes case 1).
>>>
>>> Is not tested like this. We would start on can_dlc == 8 - and then check
>>> for valid raw_dlc 9..15
>>>
>>>>
>>>>    4. Other scenarios: the frame is malformed: drop it.
>>>
>>> No dropping.
>>>
>>>> Logic should apply for both Tx and Rx paths.
>>>
>>> So test for can_dlc == 8 - and then check for valid raw_dlc 9..15 in the
>>> user space?! Fine.
>>
>> My view: users and drivers directly use raw_dlc without need to check
>> for can_dlc == 8 (detailed explanation below).
>>
>>> Btw. we can make sure that we do not have uninitialized raw_dlc value in
>>> the rx path in the enabled drivers. In fact you could always use the
>>> raw_dlc then.
>>
>> My view is that if we introduce the raw_dlc field, we should always
>> populate it with the DLC value as defined in ISO, even if
>> CAN_CTRLMODE_RAW_DLC and CAN_RAW_RAW_DLC are not active. User could
>> then benefit from this field to get the DLC value (and maximum value
>> would be either CAN_MAX_DLC or CAN_MAX_DLC RAW depending on the
>> context).
>>
>> Rationale is that the raw_dlc field would always be visible in the
>> struct can_frame. So if it is here, use it. Having it sometimes used,
>> sometimes not would be confusing for the user.
>
> I wonder if it's more confusing to fill raw_dlc without
> CAN_CTRLMODE_RAW_DLC enabled?
>
>> Basically, that would replace the use of the can_len2dlc() function.
>>
>> I also think it should be extended to CAN-FD, but no need to elaborate
>> more on this at the moment: let's keep the discussion focused on
>> classical CAN and tackle this later.
>
> Why do you want this?
>
> You get of the bike and push your bicycle for what reason?
>
> No.

Pushing the bike: yes, that was actually my motivation. One more time,
I am not telling you are wrong, just that I would do it in a different
way. I explained my view and you read it. I do not have more arguments
to bring.

>
>>> BUT it makes sense to use this test to detect cases, where someone
>>> forgot to switch on CAN_CTRLMODE_RAW_DLC or the driver doesn't support
>>> it and the programmer did not check the ctrlmode netlink return values.
>>
>> My view: if the user tries to send raw_dlc bigger than 8 and
>> CAN_CTRLMODE_RAW_DLC and CAN_RAW_RAW_DLC are not active, return
>> -EINVAL.
>
> raw_dlc has always to be bigger than 8. What am I explaining wrong all
> time that you don't get it?

Was not trying to prove you wrong but to debate other options.

>>> Ok, let me rephrase: Filling can_dlc with something else than a plain
>>> length information 0..8 ;-)
>>
>> Got it :-)
>>
>>  From my preliminary study, not so much changes or sanity check would
>> need to be added:
>>
>>    1. The current can_dlc sanity checks in can_send() and can_rcv() in
>>       net/can/af_can.c definitely needs to be adjusted (but this would
>>       also be the case for the raw_dlc field solution).
>>
>>    2. In the kernel Rx and Tx paths: the length should not be accessed
>>       directly anymore but through a getter function.
>>
>>    3. In the kernel: drivers "DLC aware" need to adapt their code and
>>       use the length getter function.
>>
>>    4. In user land: new "DLC aware" code should always use struct
>>       can_frame for classical CAN and check whether can_dlc is greater
>>       that CAN_MAX_DLEN before accessing data.
>>
>> That's mostly all that would have to be adjusted.
>>
>
> No, it is not.
> You have to go through all protocols, e.g. raw.c, bcm.c, gw.c, j1939,
> isotp.c which get a skb to be (sometimes) cloned by them. And when they
> need to modify the can_dlc value to send it to the userspace or
> whatever, you need to modify skb->data - and then you can't clone it
> anymore but need to skbcopy() them.
> HERE you will get a really big effort. Occasionally I was also thinking
> using can_dlc would be nice but then I looked into above code and your
> use-case does not justify that effort/ risk and performance impact.

I naively thought this part would not get impacted. I have yet to
understand the full impact in detail but you made your point clear.

You won :-)
Sorry for the long exchange and thank you for your patience.

>>>>> The 14 year old documentation in can.h says:
>>>>> @can_dlc: frame payload length in byte (0 .. 8)
>>>>>
>>>>> I will not break this established rule for a testing feature. The
>>>>> question from Joakim clearly showed: Don't touch this!
>>>>
>>>> My point is this is an expert feature: if you do not understand it, do
>>>> not use it and you are safe to go using the 14 years old definition.
>>>
>>> Full ACK - but you can't imagine what people do in the real world. Do
>>> you know these 'more helps more' guys that switch everything to 'on'? :-D
>>
>> Switching everything options on through an "ip link set canX up ..."
>> should not cause damage. If the user is doing random setsockopt()
>> without any clues of what he is doing, then he is already a lost cause
>> for me...
>>
>>> The approach with CAN_CTRLMODE_RAW_DLC and the described testing would
>>> be even robust against unintended miss-use.
>>
>> Exception: can_dlc == 8 and a non-initialed raw_dlc gets a garbage
>> value between 9 and 15 from the stack (c.f. above comment).
>
> Right, and what would be the effect of this?
>
> You know the answer: Nothing breaks.

A frame of 8 bytes and DLC between 9 and 15 is sent on the wire and
this has the potential to crash the other devices on the bus as
explained in my initial e-mails.

>>
>>>>> At the end it would have made more sense to call it can_frame.len in the
>>>>> first place. But this first came into our mind when CAN FD showed up.
>>>>> The discussion about the can_dlc meaning can be closed IMO. It is a
>>>>> plain length from 0..8 which is unfortunately named can_dlc.
>>>>
>>>> I agree that it should have been named len from the beginning, but as
>>>> a matter of fact it has been named "can_dlc". For me as a user, I
>>>> expect can_dlc to follow the DLC definition in ISO and so I *prefer*
>>>> to opt for the compromise of losing the plain length property rather
>>>> than losing the semantic meaning.
>>>
>>> I don't think you can claim to be a standard user. You are an expert for
>>> security testing and digged into the ISO standard and intentionally
>>> write DLC values >8 into controller registers. Users don't do things
>>> like this ;-)
>>
>> You get a point :-)
>>
>> Whatever the output of our discussion is, I have one strong
>> request. The linux kernel has a huge audience and has such, I think it
>> is our duty to be technically accurate even in small details.
>
> But we failed to do so by naming the plain data length can_dlc. We have
> to deal with that legacy.
> You can not fix that anymore.
>
>> I would like to add a comment in the struct can_frame to acknowledge
>> the discrepancy with the ISO. Some people who do not have access to
>> the ISO standard might refer instead to the SocketCAN documentation to
>> learn and understand the protocol and those people might use this
>> knowledge to program on other platforms (e.g. bare-metal
>> microcontrollers). Using the DLC as a plain length on those systems
>> might results in vulnerabilities in production and potentially safety
>> hazard.
>>
>> That's also why I would like to see the Socket CAN DLC be used
>> according to ISO: so that we raise awareness on that issue and so that
>> users can transfer their knowledge to other platforms.
>
> Vincent, stop discussing about having in can_dlc anything else then a
> plain length from 0..8.
> I'm really getting tired on this after the 6th or whatever round.
>
> Read and understand my mails.

Yours sincerely,
Vincent Mailhol

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames
  2020-10-22 15:46                                 ` Vincent MAILHOL
@ 2020-10-22 17:06                                   ` Oliver Hartkopp
  2020-10-23 10:36                                     ` Vincent MAILHOL
  0 siblings, 1 reply; 45+ messages in thread
From: Oliver Hartkopp @ 2020-10-22 17:06 UTC (permalink / raw)
  To: Vincent MAILHOL
  Cc: Marc Kleine-Budde, linux-can, kernel, Stéphane Grosjean



On 22.10.20 17:46, Vincent MAILHOL wrote:
> On 22.10.20 22:28, Oliver Hartkopp wrote:
>> On 22.10.20 14:23, Vincent MAILHOL wrote:

>>>> The above pseudo code together with CAN_CTRLMODE_RAW_DLC seems to be a
>>>> pretty safe option to me. Even if 'legacy' applications with
>>>> uninitialized raw_dlc send CAN frames or AF_PACKET users enable
>>>> CAN_CTRLMODE_RAW_DLC we always end up with a proper can_dlc == 8  and a
>>>> fitting valid raw_dlc.
>>>
>>> But what if that "fitting valid raw_dlc" is greater than 8?
>>
>> This is the plan!
>>
>>> The kernel
>>
>> the CAN driver
>>
>>> would see that frame as valid, a frame with a DLC greater than 8 will
>>> be sent on the wire and received by the other devices than might be
>>> connected on the CAN bus.
>>
>> Exactly.
> 
> And this is my point. The legacy application intended to send a frame
> of 8 bytes with a DLC of 8 but on the wire, there is a frame of 8
> bytes with a DLC greater than 8. The other devices on the CAN bus
> might ignore it (if the device has a DLC filter) or might crash if
> there is a vulnerability in DLC handling.

But this is your intention, right?

You wanted to test this behaviour. This is no normal operation - it's a 
*test* mode to detect said vulnerabilities in DLC handling.

And what do you mean with "if the device has a DLC filter"?

You told me the DLCs from 9..15 are correct from the ISO standpoint. 
That a good programmer checks the DLC and makes sure he only processes 
max. 8 bytes is a common thing and no 'filtering'.

You might introduce CAN_RAW_RAW_DLC sockopt for CAN_RAW sockets but when 
you use packet sockets e.g. with Wireshark and forge some CAN frames 
there your only chance to have proper 0..8 DLCs is to disable 
CAN_CTRLMODE_RAW_DLC.

Btw. do you really see any legacy SocketCAN applications (*together* 
with your testing application on the same Linux host) where you don't 
have the source code to check whether they properly initialize the 
reserved/padding bytes?

You can also use the can-gw to let 'malicious' CAN apps run on a private 
virtual CAN. Forwarded modified CAN frames definitely initialize the 
reserved/padding bytes.

>>> Basically, that would replace the use of the can_len2dlc() function.
>>>
>>> I also think it should be extended to CAN-FD, but no need to elaborate
>>> more on this at the moment: let's keep the discussion focused on
>>> classical CAN and tackle this later.
>>
>> Why do you want this?
>>
>> You get of the bike and push your bicycle for what reason?
>>
>> No.
> 
> Pushing the bike: yes, that was actually my motivation. One more time,
> I am not telling you are wrong, just that I would do it in a different
> way. I explained my view and you read it. I do not have more arguments
> to bring.

/me either ...

https://youtu.be/aKnX5wci404?t=34

>>>> Ok, let me rephrase: Filling can_dlc with something else than a plain
>>>> length information 0..8 ;-)
>>>
>>> Got it :-)
>>>
>>>   From my preliminary study, not so much changes or sanity check would
>>> need to be added:
>>>
>>>     1. The current can_dlc sanity checks in can_send() and can_rcv() in
>>>        net/can/af_can.c definitely needs to be adjusted (but this would
>>>        also be the case for the raw_dlc field solution).
>>>
>>>     2. In the kernel Rx and Tx paths: the length should not be accessed
>>>        directly anymore but through a getter function.
>>>
>>>     3. In the kernel: drivers "DLC aware" need to adapt their code and
>>>        use the length getter function.
>>>
>>>     4. In user land: new "DLC aware" code should always use struct
>>>        can_frame for classical CAN and check whether can_dlc is greater
>>>        that CAN_MAX_DLEN before accessing data.
>>>
>>> That's mostly all that would have to be adjusted.
>>>
>>
>> No, it is not.
>> You have to go through all protocols, e.g. raw.c, bcm.c, gw.c, j1939,
>> isotp.c which get a skb to be (sometimes) cloned by them. And when they
>> need to modify the can_dlc value to send it to the userspace or
>> whatever, you need to modify skb->data - and then you can't clone it
>> anymore but need to skbcopy() them.
>> HERE you will get a really big effort. Occasionally I was also thinking
>> using can_dlc would be nice but then I looked into above code and your
>> use-case does not justify that effort/ risk and performance impact.
> 
> I naively thought this part would not get impacted. I have yet to
> understand the full impact in detail but you made your point clear.
> 
> You won :-)
> Sorry for the long exchange and thank you for your patience.

I really don't want to 'win'. But by the time the features and 
functionalities have been grown and many people rely on its 
functionality and performance.

The discussion helps to find the hopefully best solution and brings all 
of us to new insights.

The difference is to make a new door into a house or to replace its 
entire water system. You need a VERY good reason to replace the water 
system ... when you want a new door.

>>>> The approach with CAN_CTRLMODE_RAW_DLC and the described testing would
>>>> be even robust against unintended miss-use.
>>>
>>> Exception: can_dlc == 8 and a non-initialed raw_dlc gets a garbage
>>> value between 9 and 15 from the stack (c.f. above comment).
>>
>> Right, and what would be the effect of this?
>>
>> You know the answer: Nothing breaks.
> 
> A frame of 8 bytes and DLC between 9 and 15 is sent on the wire and
> this has the potential to crash the other devices on the bus as
> explained in my initial e-mails.

Yes. But that is what you wanted to archive when you enable 
CAN_CTRLMODE_RAW_DLC ¯\_(ツ)_/¯
(see my comment above)

Regards,
Oliver

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames
  2020-10-22 17:06                                   ` Oliver Hartkopp
@ 2020-10-23 10:36                                     ` Vincent MAILHOL
  2020-10-23 16:47                                       ` Oliver Hartkopp
  0 siblings, 1 reply; 45+ messages in thread
From: Vincent MAILHOL @ 2020-10-23 10:36 UTC (permalink / raw)
  To: Oliver Hartkopp
  Cc: Marc Kleine-Budde, linux-can, kernel, Stéphane Grosjean

On 23.10.20 02:06, Oliver Hartkopp wrote:
> On 22.10.20 17:46, Vincent MAILHOL wrote:
>> On 22.10.20 22:28, Oliver Hartkopp wrote:
>>> On 22.10.20 14:23, Vincent MAILHOL wrote:
>
>>>>> The above pseudo code together with CAN_CTRLMODE_RAW_DLC seems to be a
>>>>> pretty safe option to me. Even if 'legacy' applications with
>>>>> uninitialized raw_dlc send CAN frames or AF_PACKET users enable
>>>>> CAN_CTRLMODE_RAW_DLC we always end up with a proper can_dlc == 8  and a
>>>>> fitting valid raw_dlc.
>>>>
>>>> But what if that "fitting valid raw_dlc" is greater than 8?
>>>
>>> This is the plan!
>>>
>>>> The kernel
>>>
>>> the CAN driver
>>>
>>>> would see that frame as valid, a frame with a DLC greater than 8 will
>>>> be sent on the wire and received by the other devices than might be
>>>> connected on the CAN bus.
>>>
>>> Exactly.
>>
>> And this is my point. The legacy application intended to send a frame
>> of 8 bytes with a DLC of 8 but on the wire, there is a frame of 8
>> bytes with a DLC greater than 8. The other devices on the CAN bus
>> might ignore it (if the device has a DLC filter) or might crash if
>> there is a vulnerability in DLC handling.
>
> But this is your intention, right?
>
> You wanted to test this behaviour. This is no normal operation - it's a
> *test* mode to detect said vulnerabilities in DLC handling.
>
> And what do you mean with "if the device has a DLC filter"?

A DLC filter is a sub-component of a CAN frame filter which is a type
of Intrusion Prevention System (IPS). The CAN frame filter consists of
an allow list which entries are usually CAN IDs and DLCs. If the CAN
ID and DLC of a received frame do not match any of the entries in the
allow list, the frame is directly discarded. This is also sometimes
referred to as CAN firewall. Modern CAN gateways are starting to implement
such protection mechanism (in addition to the IDS).

If an ECU has a filter rule to only allow DLC equal to 8, it would
discard a frame with a DLC greater than 8 even if the length is
actually 8.

> You told me the DLCs from 9..15 are correct from the ISO standpoint.
> That a good programmer checks the DLC and makes sure he only processes
> max. 8 bytes is a common thing and no 'filtering'.
>
> You might introduce CAN_RAW_RAW_DLC sockopt for CAN_RAW sockets but when
> you use packet sockets e.g. with Wireshark and forge some CAN frames
> there your only chance to have proper 0..8 DLCs is to disable
> CAN_CTRLMODE_RAW_DLC.

Did not think of that use case but yes, I agree that
CAN_CTRLMODE_RAW_DLC is needed. I see CAN_RAW_RAW_DLC as an addition,
not a replacement.

> Btw. do you really see any legacy SocketCAN applications (*together*
> with your testing application on the same Linux host) where you don't
> have the source code to check whether they properly initialize the
> reserved/padding bytes?
>
> You can also use the can-gw to let 'malicious' CAN apps run on a private
> virtual CAN. Forwarded modified CAN frames definitely initialize the
> reserved/padding bytes.

Here, I am replying to you about the case of the 'legacy' applications
with uninitialized raw_dlc send CAN frames.

Even if this is *my* intention, is it the intention of every other
user activating CAN_CTRLMODE_RAW_DLC?

Consider either of:

  * the newbie user who just wants a normal netlink configuration but
    copy/pasted the command from the internet without realising the
    raw_dlc option is here.

  * the 'more helps more' guys that switch everything to 'on' which
    you mentioned before. This guys does not understand the RAW_DLC
    thing but yet activated it.

  * the expert user who turned on the feature for tests but also runs
    legacy applications in parallel or after doing the tests.

All of these users are subjected to the issue on the legacy
application I explained. It is not their intention to send DLC greater
than 8.

Furthermore, the first two users do not necessarily know how to
program. They are using downloaded application and do not have the
knowledge to check for the issue nor to even understand it. (The third
user should understand. Maybe he or she is not the best example, wish
I had started my argumentation with the first two user cases).

I see two options:

  1. The user used an expert command, it is his responsibility: we do
     not care, it is his fault.

  2. We (as kernel hackers) bare responsibility for all usage scenario
     of the "ip addr set canX..." options and do not allow a scenario
     which can break an existing application.

My vote is 2. I draw the line at the code level: user (as a
programmer) is responsible for the code he or she writes but we (as
kernel hackers) try to prevent any system configuration from breaking
existing applications which are working fine.

This my interpretation of the "do not break user land" rule.

>>>> Basically, that would replace the use of the can_len2dlc() function.
>>>>
>>>> I also think it should be extended to CAN-FD, but no need to elaborate
>>>> more on this at the moment: let's keep the discussion focused on
>>>> classical CAN and tackle this later.
>>>
>>> Why do you want this?
>>>
>>> You get of the bike and push your bicycle for what reason?
>>>
>>> No.
>>
>> Pushing the bike: yes, that was actually my motivation. One more time,
>> I am not telling you are wrong, just that I would do it in a different
>> way. I explained my view and you read it. I do not have more arguments
>> to bring.
>
> /me either ...
>
> https://youtu.be/aKnX5wci404?t=34
>
>>>>> Ok, let me rephrase: Filling can_dlc with something else than a plain
>>>>> length information 0..8 ;-)
>>>>
>>>> Got it :-)
>>>>
>>>>   From my preliminary study, not so much changes or sanity check would
>>>> need to be added:
>>>>
>>>>     1. The current can_dlc sanity checks in can_send() and can_rcv() in
>>>>        net/can/af_can.c definitely needs to be adjusted (but this would
>>>>        also be the case for the raw_dlc field solution).
>>>>
>>>>     2. In the kernel Rx and Tx paths: the length should not be accessed
>>>>        directly anymore but through a getter function.
>>>>
>>>>     3. In the kernel: drivers "DLC aware" need to adapt their code and
>>>>        use the length getter function.
>>>>
>>>>     4. In user land: new "DLC aware" code should always use struct
>>>>        can_frame for classical CAN and check whether can_dlc is greater
>>>>        that CAN_MAX_DLEN before accessing data.
>>>>
>>>> That's mostly all that would have to be adjusted.
>>>>
>>>
>>> No, it is not.
>>> You have to go through all protocols, e.g. raw.c, bcm.c, gw.c, j1939,
>>> isotp.c which get a skb to be (sometimes) cloned by them. And when they
>>> need to modify the can_dlc value to send it to the userspace or
>>> whatever, you need to modify skb->data - and then you can't clone it
>>> anymore but need to skbcopy() them.
>>> HERE you will get a really big effort. Occasionally I was also thinking
>>> using can_dlc would be nice but then I looked into above code and your
>>> use-case does not justify that effort/ risk and performance impact.
>>
>> I naively thought this part would not get impacted. I have yet to
>> understand the full impact in detail but you made your point clear.
>>
>> You won :-)
>> Sorry for the long exchange and thank you for your patience.
>
> I really don't want to 'win'. But by the time the features and
> functionalities have been grown and many people rely on its
> functionality and performance.
>
> The discussion helps to find the hopefully best solution and brings all
> of us to new insights.
>
> The difference is to make a new door into a house or to replace its
> entire water system. You need a VERY good reason to replace the water
> system ... when you want a new door.

OK then let me try to re-explain another point.

I understand that you do not want to drop malformed frames so I stop
replying on that because I feel that I would only annoy you more. But
in reality I do not yet understand why you do not want to.

Below are all the valid pairs of Lengths and DLCs. Every pair outside
of the table is incorrect.

    +-----------+-----------+
    | Length    | DLC       |
    | (can_dlc) | (raw_dlc) |
    +-----------+-----------+
    | 0         | 0         |
    | 1         | 1         |
    | 2         | 2         |
    | 3         | 3         |
    | 4         | 4         |
    | 5         | 5         |
    | 6         | 6         |
    | 7         | 7         |
    | 8         | 8..15     |
    +-----------+-----------+

If the user sets, let say, can_dlc to 8 and raw_dlc to 2, he expects
to send a frame of length 8 and *DLC 2*. Such a frame does not exist
and is malformed. Your code lets this frame pass through and
eventually, the driver sends a frame of length 8 and *DLC 8* on the
wire: not what the user requested.

My point is why send something different than requested. If it is
malformed, refuse to send and tell the user "hey this is wrong".

What is the rationale for not dropping invalid frames?


Yours sincerely,
Vincent Mailhol

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames
  2020-10-23 10:36                                     ` Vincent MAILHOL
@ 2020-10-23 16:47                                       ` Oliver Hartkopp
  2020-10-24  5:25                                         ` Vincent MAILHOL
  0 siblings, 1 reply; 45+ messages in thread
From: Oliver Hartkopp @ 2020-10-23 16:47 UTC (permalink / raw)
  To: Vincent MAILHOL
  Cc: Marc Kleine-Budde, linux-can, kernel, Stéphane Grosjean



On 23.10.20 12:36, Vincent MAILHOL wrote:
> On 23.10.20 02:06, Oliver Hartkopp wrote:
>> On 22.10.20 17:46, Vincent MAILHOL wrote:

>> And what do you mean with "if the device has a DLC filter"?
> 
> A DLC filter is a sub-component of a CAN frame filter which is a type
> of Intrusion Prevention System (IPS). The CAN frame filter consists of
> an allow list which entries are usually CAN IDs and DLCs. If the CAN
> ID and DLC of a received frame do not match any of the entries in the
> allow list, the frame is directly discarded. This is also sometimes
> referred to as CAN firewall. Modern CAN gateways are starting to implement
> such protection mechanism (in addition to the IDS).
> 
> If an ECU has a filter rule to only allow DLC equal to 8, it would
> discard a frame with a DLC greater than 8 even if the length is
> actually 8.

Yes, which is fine. If you want your CAN network work as expected, do 
not enable CAN_CTRLMODE_RAW_DLC.

If you want to test the behaviour of other nodes in your network enable 
CAN_CTRLMODE_RAW_DLC.

>> You told me the DLCs from 9..15 are correct from the ISO standpoint.
>> That a good programmer checks the DLC and makes sure he only processes
>> max. 8 bytes is a common thing and no 'filtering'.
>>
>> You might introduce CAN_RAW_RAW_DLC sockopt for CAN_RAW sockets but when
>> you use packet sockets e.g. with Wireshark and forge some CAN frames
>> there your only chance to have proper 0..8 DLCs is to disable
>> CAN_CTRLMODE_RAW_DLC.
> 
> Did not think of that use case but yes, I agree that
> CAN_CTRLMODE_RAW_DLC is needed. I see CAN_RAW_RAW_DLC as an addition,
> not a replacement.

Yes. And CAN_RAW_RAW_DLC is also pretty easy to implement on the tx 
side. But as I already wrote: It still does not help with AF_PACKET sockets.

>> Btw. do you really see any legacy SocketCAN applications (*together*
>> with your testing application on the same Linux host) where you don't
>> have the source code to check whether they properly initialize the
>> reserved/padding bytes?

Do you?

>> You can also use the can-gw to let 'malicious' CAN apps run on a private
>> virtual CAN. Forwarded modified CAN frames definitely initialize the
>> reserved/padding bytes.
> 
> Here, I am replying to you about the case of the 'legacy' applications
> with uninitialized raw_dlc send CAN frames.
> 
> Even if this is *my* intention, is it the intention of every other
> user activating CAN_CTRLMODE_RAW_DLC?
> 
> Consider either of:
> 
>    * the newbie user who just wants a normal netlink configuration but
>      copy/pasted the command from the internet without realising the
>      raw_dlc option is here.

You can enable the option only as root user. You can not protect every 
noob. If you fiddle with things you have no clue about, you can fail. 
That is a learning curve :-)

We provide reasonable defaults.

>    * the 'more helps more' guys that switch everything to 'on' which
>      you mentioned before. This guys does not understand the RAW_DLC
>      thing but yet activated it.
> 
>    * the expert user who turned on the feature for tests but also runs
>      legacy applications in parallel or after doing the tests.

Again, as I asked before: What legacy applications for the (Open Source) 
SocketCAN could there be on an *experts* system where he could not look 
into to search uninitialized CAN frame structs?

IMO this is an academical question without value.

> 
> All of these users are subjected to the issue on the legacy
> application I explained. It is not their intention to send DLC greater
> than 8.

Then they should not enable CAN_CTRLMODE_RAW_DLC.

> Furthermore, the first two users do not necessarily know how to
> program. They are using downloaded application and do not have the
> knowledge to check for the issue nor to even understand it. (The third
> user should understand. Maybe he or she is not the best example, wish
> I had started my argumentation with the first two user cases).
> 
> I see two options:
> 
>    1. The user used an expert command, it is his responsibility: we do
>       not care, it is his fault.
> 
>    2. We (as kernel hackers) bare responsibility for all usage scenario
>       of the "ip addr set canX..." options and do not allow a scenario
>       which can break an existing application.
> 
> My vote is 2. I draw the line at the code level: user (as a
> programmer) is responsible for the code he or she writes but we (as
> kernel hackers) try to prevent any system configuration from breaking
> existing applications which are working fine.

By default CAN_CTRLMODE_RAW_DLC is disabled.

Even with CAN_CTRLMODE_RAW_DLC enabled all existing applications would 
still work fine.

They will still be able to send and receive CAN frames having proper 
length information in can_dlc - so nothing breaks.

The only thing that could happen is, that their sent CAN frames with 8 
bytes of payload may have a DLC from 8..15 which is still covered by the 
ISO standard. This is no fault.

You can not take responsibility for broken implementations on other ECUs.

>>> You won :-)
>>> Sorry for the long exchange and thank you for your patience.
>>
>> I really don't want to 'win'. But by the time the features and
>> functionalities have been grown and many people rely on its
>> functionality and performance.
>>
>> The discussion helps to find the hopefully best solution and brings all
>> of us to new insights.
>>
>> The difference is to make a new door into a house or to replace its
>> entire water system. You need a VERY good reason to replace the water
>> system ... when you want a new door.
> 
> OK then let me try to re-explain another point.
> 
> I understand that you do not want to drop malformed frames so I stop
> replying on that because I feel that I would only annoy you more. But
> in reality I do not yet understand why you do not want to.
> 
> Below are all the valid pairs of Lengths and DLCs. Every pair outside
> of the table is incorrect.
> 
>      +-----------+-----------+
>      | Length    | DLC       |
>      | (can_dlc) | (raw_dlc) |
>      +-----------+-----------+
>      | 0         | 0         |
>      | 1         | 1         |
>      | 2         | 2         |
>      | 3         | 3         |
>      | 4         | 4         |
>      | 5         | 5         |
>      | 6         | 6         |
>      | 7         | 7         |
>      | 8         | 8..15     |
>      +-----------+-----------+
> 
> If the user sets, let say, can_dlc to 8 and raw_dlc to 2, he expects
> to send a frame of length 8 and *DLC 2*.

This is BS! How can you create such an impossible case after all of our 
discussions?

I write it AGAIN, ONLY FOR YOU:

if (can_dlc == 8) && (raw_dlc > 8 && raw_dlc <= 15)
     dlc = raw_dlc;
else
     dlc = can_dlc;

can_dlc == 8 is the entry door to write dlc values from 9..15 into the 
controller register.
Which always leads to a valid CAN frame with 8 bytes.

So what is so hard to write this into the documentation then?

There is no expectation that anything else then can_dlc is used when 
raw_dlc is set to 2.

And therefore there will never be an invalid frame.

Regards,
Oliver


> Such a frame does not exist
> and is malformed. Your code lets this frame pass through and
> eventually, the driver sends a frame of length 8 and *DLC 8* on the
> wire: not what the user requested.
> 
> My point is why send something different than requested. If it is
> malformed, refuse to send and tell the user "hey this is wrong".
> 
> What is the rationale for not dropping invalid frames?
> 
> 
> Yours sincerely,
> Vincent Mailhol
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames
  2020-10-23 16:47                                       ` Oliver Hartkopp
@ 2020-10-24  5:25                                         ` Vincent MAILHOL
  2020-10-24 11:31                                           ` Oliver Hartkopp
  0 siblings, 1 reply; 45+ messages in thread
From: Vincent MAILHOL @ 2020-10-24  5:25 UTC (permalink / raw)
  To: Oliver Hartkopp
  Cc: Marc Kleine-Budde, linux-can, kernel, Stéphane Grosjean

On 24.10.20 01:47, Oliver Hartkopp wrote:
> On 23.10.20 12:36, Vincent MAILHOL wrote:
>> On 23.10.20 02:06, Oliver Hartkopp wrote:
>>> On 22.10.20 17:46, Vincent MAILHOL wrote:
>
>>> And what do you mean with "if the device has a DLC filter"?
>>
>> A DLC filter is a sub-component of a CAN frame filter which is a type
>> of Intrusion Prevention System (IPS). The CAN frame filter consists of
>> an allow list which entries are usually CAN IDs and DLCs. If the CAN
>> ID and DLC of a received frame do not match any of the entries in the
>> allow list, the frame is directly discarded. This is also sometimes
>> referred to as CAN firewall. Modern CAN gateways are starting to implement
>> such protection mechanism (in addition to the IDS).
>>
>> If an ECU has a filter rule to only allow DLC equal to 8, it would
>> discard a frame with a DLC greater than 8 even if the length is
>> actually 8.
>
> Yes, which is fine. If you want your CAN network work as expected, do
> not enable CAN_CTRLMODE_RAW_DLC.
>
> If you want to test the behaviour of other nodes in your network enable
> CAN_CTRLMODE_RAW_DLC.
>
>>> You told me the DLCs from 9..15 are correct from the ISO standpoint.
>>> That a good programmer checks the DLC and makes sure he only processes
>>> max. 8 bytes is a common thing and no 'filtering'.
>>>
>>> You might introduce CAN_RAW_RAW_DLC sockopt for CAN_RAW sockets but when
>>> you use packet sockets e.g. with Wireshark and forge some CAN frames
>>> there your only chance to have proper 0..8 DLCs is to disable
>>> CAN_CTRLMODE_RAW_DLC.
>>
>> Did not think of that use case but yes, I agree that
>> CAN_CTRLMODE_RAW_DLC is needed. I see CAN_RAW_RAW_DLC as an addition,
>> not a replacement.
>
> Yes. And CAN_RAW_RAW_DLC is also pretty easy to implement on the tx
> side. But as I already wrote: It still does not help with AF_PACKET sockets.
>
>>> Btw. do you really see any legacy SocketCAN applications (*together*
>>> with your testing application on the same Linux host) where you don't
>>> have the source code to check whether they properly initialize the
>>> reserved/padding bytes?
>
> Do you?
>
>>> You can also use the can-gw to let 'malicious' CAN apps run on a private
>>> virtual CAN. Forwarded modified CAN frames definitely initialize the
>>> reserved/padding bytes.
>>
>> Here, I am replying to you about the case of the 'legacy' applications
>> with uninitialized raw_dlc send CAN frames.
>>
>> Even if this is *my* intention, is it the intention of every other
>> user activating CAN_CTRLMODE_RAW_DLC?
>>
>> Consider either of:
>>
>>    * the newbie user who just wants a normal netlink configuration but
>>      copy/pasted the command from the internet without realising the
>>      raw_dlc option is here.
>
> You can enable the option only as root user. You can not protect every
> noob. If you fiddle with things you have no clue about, you can fail.
> That is a learning curve :-)
>
> We provide reasonable defaults.
>
>>    * the 'more helps more' guys that switch everything to 'on' which
>>      you mentioned before. This guys does not understand the RAW_DLC
>>      thing but yet activated it.
>>
>>    * the expert user who turned on the feature for tests but also runs
>>      legacy applications in parallel or after doing the tests.
>
> Again, as I asked before: What legacy applications for the (Open Source)
> SocketCAN could there be on an *experts* system where he could not look
> into to search uninitialized CAN frame structs?
>
> IMO this is an academical question without value.
>
>>
>> All of these users are subjected to the issue on the legacy
>> application I explained. It is not their intention to send DLC greater
>> than 8.
>
> Then they should not enable CAN_CTRLMODE_RAW_DLC.
>
>> Furthermore, the first two users do not necessarily know how to
>> program. They are using downloaded application and do not have the
>> knowledge to check for the issue nor to even understand it. (The third
>> user should understand. Maybe he or she is not the best example, wish
>> I had started my argumentation with the first two user cases).
>>
>> I see two options:
>>
>>    1. The user used an expert command, it is his responsibility: we do
>>       not care, it is his fault.
>>
>>    2. We (as kernel hackers) bare responsibility for all usage scenario
>>       of the "ip addr set canX..." options and do not allow a scenario
>>       which can break an existing application.
>>
>> My vote is 2. I draw the line at the code level: user (as a
>> programmer) is responsible for the code he or she writes but we (as
>> kernel hackers) try to prevent any system configuration from breaking
>> existing applications which are working fine.
>
> By default CAN_CTRLMODE_RAW_DLC is disabled.
>
> Even with CAN_CTRLMODE_RAW_DLC enabled all existing applications would
> still work fine.
>
> They will still be able to send and receive CAN frames having proper
> length information in can_dlc - so nothing breaks.
>
> The only thing that could happen is, that their sent CAN frames with 8
> bytes of payload may have a DLC from 8..15 which is still covered by the
> ISO standard. This is no fault.
>
> You can not take responsibility for broken implementations on other ECUs.

We clearly share a difference stance on this subject, I think it is
time to close the debate. I agree to go with your solution.

One more time, thank you for your time and patience :-)

>>>> You won :-)
>>>> Sorry for the long exchange and thank you for your patience.
>>>
>>> I really don't want to 'win'. But by the time the features and
>>> functionalities have been grown and many people rely on its
>>> functionality and performance.
>>>
>>> The discussion helps to find the hopefully best solution and brings all
>>> of us to new insights.
>>>
>>> The difference is to make a new door into a house or to replace its
>>> entire water system. You need a VERY good reason to replace the water
>>> system ... when you want a new door.
>>
>> OK then let me try to re-explain another point.
>>
>> I understand that you do not want to drop malformed frames so I stop
>> replying on that because I feel that I would only annoy you more. But
>> in reality I do not yet understand why you do not want to.
>>
>> Below are all the valid pairs of Lengths and DLCs. Every pair outside
>> of the table is incorrect.
>>
>>      +-----------+-----------+
>>      | Length    | DLC       |
>>      | (can_dlc) | (raw_dlc) |
>>      +-----------+-----------+
>>      | 0         | 0         |
>>      | 1         | 1         |
>>      | 2         | 2         |
>>      | 3         | 3         |
>>      | 4         | 4         |
>>      | 5         | 5         |
>>      | 6         | 6         |
>>      | 7         | 7         |
>>      | 8         | 8..15     |
>>      +-----------+-----------+
>>
>> If the user sets, let say, can_dlc to 8 and raw_dlc to 2, he expects
>> to send a frame of length 8 and *DLC 2*.
>
> This is BS! How can you create such an impossible case after all of our
> discussions?
>
> I write it AGAIN, ONLY FOR YOU:
>
> if (can_dlc == 8) && (raw_dlc > 8 && raw_dlc <= 15)
>      dlc = raw_dlc;
> else
>      dlc = can_dlc;
>
> can_dlc == 8 is the entry door to write dlc values from 9..15 into the
> controller register.
> Which always leads to a valid CAN frame with 8 bytes.
>
> So what is so hard to write this into the documentation then?
>
> There is no expectation that anything else then can_dlc is used when
> raw_dlc is set to 2.
>
> And therefore there will never be an invalid frame.

I saw your patch. Changing the name from raw_can to len8_dlc clears
the concerns I had for invalid frames. Thank you.

Thank you also for depreciating the can_dlc field and introducing the
len field in struct can_frame. It is a smart change.

I will now test the patch.


Yours sincerely,
Vincent Mailhol

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames
  2020-10-24  5:25                                         ` Vincent MAILHOL
@ 2020-10-24 11:31                                           ` Oliver Hartkopp
  0 siblings, 0 replies; 45+ messages in thread
From: Oliver Hartkopp @ 2020-10-24 11:31 UTC (permalink / raw)
  To: Vincent MAILHOL
  Cc: Marc Kleine-Budde, linux-can, kernel, Stéphane Grosjean



On 24.10.20 07:25, Vincent MAILHOL wrote:
> On 24.10.20 01:47, Oliver Hartkopp wrote:

>>>> Btw. do you really see any legacy SocketCAN applications (*together*
>>>> with your testing application on the same Linux host) where you don't
>>>> have the source code to check whether they properly initialize the
>>>> reserved/padding bytes?
>>
>> Do you?

You still didn't answer this hypothetical question. There must be a 
reason behind .. :-D

>> You can not take responsibility for broken implementations on other ECUs.
> 
> We clearly share a difference stance on this subject, I think it is
> time to close the debate. I agree to go with your solution.

Ok, together with this answer I also can close the debate :-)

> I saw your patch. Changing the name from raw_can to len8_dlc clears
> the concerns I had for invalid frames. Thank you.

Great! In fact it cost some time in the night where I woke up from sleep 
and think about "the overall goal what we want to archive" ;-)

The right naming lets developers make right assumptions about the 
functionality. Therefore it always remains tricky - and discussions like 
these. I needed two hours to write an rephrase the patch again and again ...

Also the ctrlmode has now a clear target of Classic CAN (CC) in its 
name: CC_LEN8_DLC

> Thank you also for depreciating the can_dlc field and introducing the
> len field in struct can_frame. It is a smart change.

Thanks. I always had the idea to fix this up after the introduction of 
CAN FD which uses a 'len' element too. I discovered myself at that time, 
that can_dlc was not the best matching choice - but Kernel APIs are 
written in stone for a good reason.

So I've checked what people do, when they need to rename structure 
elements in the Kernel API in the uapi section and found this:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=31ba514b2fd0495796

> I will now test the patch.

So far it only is an idea for naming 'things' - but I'm happy you like it!

IMO we could continue with enabling a CAN driver to support 
CAN_CTRLMODE_CC_LEN8_DLC and see how it works in reality with CAN_RAW 
and AF_PACKET - and if we would finally need CAN_RAW_LEN8_DLC or not.

Best,
Oliver

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, back to index

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-19 19:05 [RFC]: can 2020-10-19 Marc Kleine-Budde
2020-10-19 19:05 ` [net-rfc 01/16] can: proc: can_remove_proc(): silence remove_proc_entry warning Marc Kleine-Budde
2020-10-19 19:05 ` [net-rfc 02/16] can: rx-offload: don't call kfree_skb() from IRQ context Marc Kleine-Budde
2020-10-19 19:05 ` [net-rfc 03/16] can: dev: can_get_echo_skb(): prevent call to kfree_skb() in hard " Marc Kleine-Budde
2020-10-19 19:05 ` [net-rfc 04/16] can: dev: can_get_len(): add a helper function to get the correct length of Classical frames Marc Kleine-Budde
2020-10-19 20:35   ` Oliver Hartkopp
2020-10-20  6:35     ` Marc Kleine-Budde
2020-10-20 11:30       ` Vincent Mailhol
2020-10-20 11:48         ` Marc Kleine-Budde
2020-10-20 12:38         ` Oliver Hartkopp
2020-10-20 15:02           ` Marc Kleine-Budde
2020-10-20 16:07           ` Vincent Mailhol
2020-10-20 17:04             ` Oliver Hartkopp
2020-10-20 18:50               ` Marc Kleine-Budde
2020-10-21  0:52               ` Vincent Mailhol
2020-10-21  6:23                 ` Vincent MAILHOL
2020-10-21  7:11                   ` Joakim Zhang
2020-10-21  7:21                     ` Marc Kleine-Budde
2020-10-21  7:48                       ` Joakim Zhang
2020-10-21  9:21                   ` Oliver Hartkopp
2020-10-21  9:48                     ` Oliver Hartkopp
2020-10-21 11:55                     ` Vincent MAILHOL
2020-10-21 17:52                       ` Oliver Hartkopp
2020-10-22  3:30                         ` Vincent MAILHOL
2020-10-22  7:15                           ` Oliver Hartkopp
2020-10-22 12:23                             ` Vincent MAILHOL
2020-10-22 13:28                               ` Oliver Hartkopp
2020-10-22 15:46                                 ` Vincent MAILHOL
2020-10-22 17:06                                   ` Oliver Hartkopp
2020-10-23 10:36                                     ` Vincent MAILHOL
2020-10-23 16:47                                       ` Oliver Hartkopp
2020-10-24  5:25                                         ` Vincent MAILHOL
2020-10-24 11:31                                           ` Oliver Hartkopp
2020-10-19 19:05 ` [net-rfc 05/16] can: dev: __can_get_echo_skb(): fix the returned length of CAN frame Marc Kleine-Budde
2020-10-19 19:05 ` [net-rfc 06/16] can: can_create_echo_skb(): fix echo skb generation: always use skb_clone() Marc Kleine-Budde
2020-10-19 19:05 ` [net-rfc 07/16] can: j1939: j1939_sk_bind(): return failure if netdev is down Marc Kleine-Budde
2020-10-19 19:05 ` [net-rfc 08/16] can: isotp: Explain PDU in CAN_ISOTP help text Marc Kleine-Budde
2020-10-19 19:05 ` [net-rfc 09/16] can: isotp: enable RX timeout handling in listen-only mode Marc Kleine-Budde
2020-10-19 19:05 ` [net-rfc 10/16] can: ti_hecc: add missed clk_disable_unprepare() in error path Marc Kleine-Budde
2020-10-19 19:05 ` [net-rfc 11/16] can: xilinx_can: handle failure cases of pm_runtime_get_sync Marc Kleine-Budde
2020-10-19 19:05 ` [net-rfc 12/16] can: peak_usb: fix timestamp wrapping Marc Kleine-Budde
2020-10-19 19:05 ` [net-rfc 13/16] can: peak_canfd: fix echo management when loopback is on Marc Kleine-Budde
2020-10-19 19:05 ` [net-rfc 14/16] can: mcp251xfd: mcp251xfd_regmap_crc_read(): increase severity of CRC read error messages Marc Kleine-Budde
2020-10-19 19:05 ` [net-rfc 15/16] can: mcp251xfd: mcp251xfd_regmap_nocrc_read(): fix semicolon.cocci warnings Marc Kleine-Budde
2020-10-19 19:05 ` [net-rfc 16/16] can: mcp251xfd: remove unneeded break Marc Kleine-Budde

Linux-Can Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-can/0 linux-can/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-can linux-can/ https://lore.kernel.org/linux-can \
		linux-can@vger.kernel.org
	public-inbox-index linux-can

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-can


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git