netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] net: sch_generic: fq_codel vs pfifo_fast
@ 2019-03-27 16:56 Marc Kleine-Budde
  2019-03-27 16:56 ` [PATCH 1/2] net: sch_generic: add flag IFF_FIFO_QUEUE to use pfifo_fast as default scheduler Marc Kleine-Budde
                   ` (7 more replies)
  0 siblings, 8 replies; 25+ messages in thread
From: Marc Kleine-Budde @ 2019-03-27 16:56 UTC (permalink / raw)
  To: netdev
  Cc: davem, linux-can, kernel, Dave Taht, Jamal Hadi Salim, Cong Wang,
	Jiri Pirko

Hello,

on CAN networking hardware we (the CAN community) experience a lot silent,
unwanted frame drops inside the kernel. (See first patch for details.) So
here's a patch series to keep pfifo_fast as default scheduler for CAN hardware
by default.

Consider this as an RFC. Regards,
Marc



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/2] net: sch_generic: add flag IFF_FIFO_QUEUE to use pfifo_fast as default scheduler
  2019-03-27 16:56 [RFC] net: sch_generic: fq_codel vs pfifo_fast Marc Kleine-Budde
@ 2019-03-27 16:56 ` Marc Kleine-Budde
  2019-03-27 17:14   ` Cong Wang
  2019-03-27 18:53   ` Uwe Kleine-König
  2019-03-27 16:56 ` [PATCH 2/2] can: dev: let all CAN devices " Marc Kleine-Budde
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 25+ messages in thread
From: Marc Kleine-Budde @ 2019-03-27 16:56 UTC (permalink / raw)
  To: netdev
  Cc: davem, linux-can, kernel, Dave Taht, Jamal Hadi Salim, Cong Wang,
	Jiri Pirko, Marc Kleine-Budde

There is networking hardware that isn't based on Ethernet for layers 1 and 2.

For example CAN.

CAN is a multi-master serial bus standard for connecting Electronic Control
Units [ECUs] also known as nodes. A frame on the CAN bus carries up to 8 bytes
of payload. Frame corruption is detected by a CRC. However frame loss due to
corruption is possible, but a quite unusual phenomenon.

While fq_codel works great for TCP/IP, it doesn't for CAN. There are a lot of
legacy protocols on top of CAN, which are not build with flow control or high
CAN frame drop rates in mind.

When using fq_codel, as soon as the queue reaches a certain delay based length,
skbs from the head of the queue are silently dropped. Silently meaning that the
user space using a send() or similar syscall doesn't get an error. However
TCP's flow control algorithm will detect dropped packages and adjust the
bandwidth accordingly.

When using fq_codel and sending raw frames over CAN, which is the common use
case, the user space thinks the package has been sent without problems, because
send() returned without an error. pfifo_fast will drop skbs, if the queue
length exceeds the maximum. But with this scheduler the skbs at the tail are
dropped, an error (-ENOBUFS) is propagated to user space. So that the user
space can slow down the package generation.

On distributions, where fq_codel is made default via CONFIG_DEFAULT_NET_SCH
during compile time, or set default during runtime with sysctl
net.core.default_qdisc (see [1]), we get a bad user experience. In my test case
with pfifo_fast, I can transfer thousands of million CAN frames without a frame
drop. On the other hand with fq_codel there is more then one lost CAN frame per
thousand frames.

As pointed out fq_codel is not suited for CAN hardware, so this patch
introduces a new netdev_priv_flag called "IFF_FIFO_QUEUE" (in contrast to the
existing "IFF_NO_QUEUE").

During transition of a netdev from down to up state the default queuing
discipline is attached by attach_default_qdiscs() with the help of
attach_one_default_qdisc(). This patch modifies attach_one_default_qdisc() to
attach the pfifo_fast (pfifo_fast_ops) if the "IFF_FIFO_QUEUE" flag is set.

[1] https://github.com/systemd/systemd/issues/9194

Cc: Dave Taht <dave.taht@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 include/linux/netdevice.h | 3 +++
 net/sched/sch_generic.c   | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 166fdc0a78b4..1867e27e3369 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1498,6 +1498,7 @@ struct net_device_ops {
  * @IFF_FAILOVER: device is a failover master device
  * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
  * @IFF_L3MDEV_RX_HANDLER: only invoke the rx handler of L3 master device
+ * @IFF_FIFO_QUEUE: device must run with FIFO qdisc attached. skb drop without NET_XMIT_DROP is fatal
  */
 enum netdev_priv_flags {
 	IFF_802_1Q_VLAN			= 1<<0,
@@ -1530,6 +1531,7 @@ enum netdev_priv_flags {
 	IFF_FAILOVER			= 1<<27,
 	IFF_FAILOVER_SLAVE		= 1<<28,
 	IFF_L3MDEV_RX_HANDLER		= 1<<29,
+	IFF_FIFO_QUEUE			= 1<<30,
 };
 
 #define IFF_802_1Q_VLAN			IFF_802_1Q_VLAN
@@ -1561,6 +1563,7 @@ enum netdev_priv_flags {
 #define IFF_FAILOVER			IFF_FAILOVER
 #define IFF_FAILOVER_SLAVE		IFF_FAILOVER_SLAVE
 #define IFF_L3MDEV_RX_HANDLER		IFF_L3MDEV_RX_HANDLER
+#define IFF_FIFO_QUEUE			IFF_FIFO_QUEUE
 
 /**
  *	struct net_device - The DEVICE structure.
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 81356ef38d1d..c309d0751cbc 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -1049,6 +1049,9 @@ static void attach_one_default_qdisc(struct net_device *dev,
 	struct Qdisc *qdisc;
 	const struct Qdisc_ops *ops = default_qdisc_ops;
 
+	if (dev->priv_flags & IFF_FIFO_QUEUE)
+		ops = &pfifo_fast_ops;
+
 	if (dev->priv_flags & IFF_NO_QUEUE)
 		ops = &noqueue_qdisc_ops;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/2] can: dev: let all CAN devices use pfifo_fast as default scheduler
  2019-03-27 16:56 [RFC] net: sch_generic: fq_codel vs pfifo_fast Marc Kleine-Budde
  2019-03-27 16:56 ` [PATCH 1/2] net: sch_generic: add flag IFF_FIFO_QUEUE to use pfifo_fast as default scheduler Marc Kleine-Budde
@ 2019-03-27 16:56 ` Marc Kleine-Budde
  2019-03-27 18:30 ` [RFC] net: sch_generic: fq_codel vs pfifo_fast Stephen Hemminger
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Marc Kleine-Budde @ 2019-03-27 16:56 UTC (permalink / raw)
  To: netdev
  Cc: davem, linux-can, kernel, Dave Taht, Jamal Hadi Salim, Cong Wang,
	Jiri Pirko, Marc Kleine-Budde

When using fq_codel and sending raw frames over CAN, which is the common use
case, the user space thinks the package has been sent without problems, because
send() returned without an error. pfifo_fast will drop skbs, if the queue
length exceeds the maximum. But with this scheduler the skbs at the tail are
dropped, an error (-ENOBUFS) is propagated to user space. So that the user
space can slow down the package generation.

On distributions, where fq_codel is made default via CONFIG_DEFAULT_NET_SCH
during compile time, or set default during runtime with sysctl
net.core.default_qdisc (see [1]), we get a bad user experience. In my test case
with pfifo_fast, I can transfer thousands of million CAN frames without a frame
drop. On the other hand with fq_codel there is more then one lost CAN frame per
thousand frames.

This patch sets the flag IFF_FIFO_QUEUE on all CAN devices.

Cc: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 drivers/net/can/dev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
index c05e4d50d43d..34bcabc35127 100644
--- a/drivers/net/can/dev.c
+++ b/drivers/net/can/dev.c
@@ -646,6 +646,7 @@ static void can_setup(struct net_device *dev)
 
 	/* New-style flags. */
 	dev->flags = IFF_NOARP;
+	dev->priv_flags = IFF_FIFO_QUEUE;
 	dev->features = NETIF_F_HW_CSUM;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] net: sch_generic: add flag IFF_FIFO_QUEUE to use pfifo_fast as default scheduler
  2019-03-27 16:56 ` [PATCH 1/2] net: sch_generic: add flag IFF_FIFO_QUEUE to use pfifo_fast as default scheduler Marc Kleine-Budde
@ 2019-03-27 17:14   ` Cong Wang
  2019-03-27 20:11     ` Marc Kleine-Budde
  2019-03-27 18:53   ` Uwe Kleine-König
  1 sibling, 1 reply; 25+ messages in thread
From: Cong Wang @ 2019-03-27 17:14 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: Linux Kernel Network Developers, David Miller, linux-can, kernel,
	Dave Taht, Jamal Hadi Salim, Jiri Pirko

On Wed, Mar 27, 2019 at 9:56 AM Marc Kleine-Budde <mkl@pengutronix.de> wrote:
>
> There is networking hardware that isn't based on Ethernet for layers 1 and 2.
>
> For example CAN.
>
> CAN is a multi-master serial bus standard for connecting Electronic Control
> Units [ECUs] also known as nodes. A frame on the CAN bus carries up to 8 bytes
> of payload. Frame corruption is detected by a CRC. However frame loss due to
> corruption is possible, but a quite unusual phenomenon.
>
> While fq_codel works great for TCP/IP, it doesn't for CAN. There are a lot of
> legacy protocols on top of CAN, which are not build with flow control or high
> CAN frame drop rates in mind.
>
> When using fq_codel, as soon as the queue reaches a certain delay based length,
> skbs from the head of the queue are silently dropped. Silently meaning that the
> user space using a send() or similar syscall doesn't get an error. However
> TCP's flow control algorithm will detect dropped packages and adjust the
> bandwidth accordingly.
>
> When using fq_codel and sending raw frames over CAN, which is the common use
> case, the user space thinks the package has been sent without problems, because
> send() returned without an error. pfifo_fast will drop skbs, if the queue
> length exceeds the maximum. But with this scheduler the skbs at the tail are
> dropped, an error (-ENOBUFS) is propagated to user space. So that the user
> space can slow down the package generation.
>
> On distributions, where fq_codel is made default via CONFIG_DEFAULT_NET_SCH
> during compile time, or set default during runtime with sysctl
> net.core.default_qdisc (see [1]), we get a bad user experience. In my test case
> with pfifo_fast, I can transfer thousands of million CAN frames without a frame
> drop. On the other hand with fq_codel there is more then one lost CAN frame per
> thousand frames.
>
> As pointed out fq_codel is not suited for CAN hardware, so this patch
> introduces a new netdev_priv_flag called "IFF_FIFO_QUEUE" (in contrast to the
> existing "IFF_NO_QUEUE").
>
> During transition of a netdev from down to up state the default queuing
> discipline is attached by attach_default_qdiscs() with the help of
> attach_one_default_qdisc(). This patch modifies attach_one_default_qdisc() to
> attach the pfifo_fast (pfifo_fast_ops) if the "IFF_FIFO_QUEUE" flag is set.

I wonder if we just need to allow arbitrary default qdisc per netdevice
while you are on it. A private flag is simply a boolean, perhaps in the
future other type of devices wants other default qdiscs, so that could
make it more flexible.

Just a thought.

Thanks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] net: sch_generic: fq_codel vs pfifo_fast
  2019-03-27 16:56 [RFC] net: sch_generic: fq_codel vs pfifo_fast Marc Kleine-Budde
  2019-03-27 16:56 ` [PATCH 1/2] net: sch_generic: add flag IFF_FIFO_QUEUE to use pfifo_fast as default scheduler Marc Kleine-Budde
  2019-03-27 16:56 ` [PATCH 2/2] can: dev: let all CAN devices " Marc Kleine-Budde
@ 2019-03-27 18:30 ` Stephen Hemminger
  2019-03-27 19:24   ` Marc Kleine-Budde
  2019-10-22 12:47 ` [PATCH] net: sch_generic: Use pfifo_fast as fallback scheduler for CAN hardware Vincent Prince
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Stephen Hemminger @ 2019-03-27 18:30 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: netdev, davem, linux-can, kernel, Dave Taht, Jamal Hadi Salim,
	Cong Wang, Jiri Pirko

On Wed, 27 Mar 2019 17:56:30 +0100
Marc Kleine-Budde <mkl@pengutronix.de> wrote:

> Hello,
> 
> on CAN networking hardware we (the CAN community) experience a lot silent,
> unwanted frame drops inside the kernel. (See first patch for details.) So
> here's a patch series to keep pfifo_fast as default scheduler for CAN hardware
> by default.
> 
> Consider this as an RFC. Regards,
> Marc
> 
> 

Why do you set fq_codel as default qdisc if you know it doesn't work right for your
environment. Is this a distro's want one value problem?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] net: sch_generic: add flag IFF_FIFO_QUEUE to use pfifo_fast as default scheduler
  2019-03-27 16:56 ` [PATCH 1/2] net: sch_generic: add flag IFF_FIFO_QUEUE to use pfifo_fast as default scheduler Marc Kleine-Budde
  2019-03-27 17:14   ` Cong Wang
@ 2019-03-27 18:53   ` Uwe Kleine-König
  2019-03-27 19:27     ` Marc Kleine-Budde
  1 sibling, 1 reply; 25+ messages in thread
From: Uwe Kleine-König @ 2019-03-27 18:53 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: netdev, Jiri Pirko, Dave Taht, Jamal Hadi Salim, kernel,
	Cong Wang, linux-can, davem

On Wed, Mar 27, 2019 at 05:56:31PM +0100, Marc Kleine-Budde wrote:
> There is networking hardware that isn't based on Ethernet for layers 1 and 2.
> 
> For example CAN.
> 
> CAN is a multi-master serial bus standard for connecting Electronic Control
> Units [ECUs] also known as nodes. A frame on the CAN bus carries up to 8 bytes
> of payload. Frame corruption is detected by a CRC. However frame loss due to
> corruption is possible, but a quite unusual phenomenon.
> 
> While fq_codel works great for TCP/IP, it doesn't for CAN. There are a lot of
> legacy protocols on top of CAN, which are not build with flow control or high
> CAN frame drop rates in mind.
> 
> When using fq_codel, as soon as the queue reaches a certain delay based length,
> skbs from the head of the queue are silently dropped. Silently meaning that the
> user space using a send() or similar syscall doesn't get an error. However
> TCP's flow control algorithm will detect dropped packages and adjust the

s/package/packet/ here and in a few more locations in this commit log.

> bandwidth accordingly.
> 
> When using fq_codel and sending raw frames over CAN, which is the common use
> case, the user space thinks the package has been sent without problems, because
> send() returned without an error. pfifo_fast will drop skbs, if the queue
> length exceeds the maximum. But with this scheduler the skbs at the tail are
> dropped, an error (-ENOBUFS) is propagated to user space. So that the user
> space can slow down the package generation.
> 
> On distributions, where fq_codel is made default via CONFIG_DEFAULT_NET_SCH
> during compile time, or set default during runtime with sysctl
> net.core.default_qdisc (see [1]), we get a bad user experience. In my test case
> with pfifo_fast, I can transfer thousands of million CAN frames without a frame
> drop. On the other hand with fq_codel there is more then one lost CAN frame per
> thousand frames.
> 
> As pointed out fq_codel is not suited for CAN hardware, so this patch
> introduces a new netdev_priv_flag called "IFF_FIFO_QUEUE" (in contrast to the
> existing "IFF_NO_QUEUE").
> 
> During transition of a netdev from down to up state the default queuing
> discipline is attached by attach_default_qdiscs() with the help of
> attach_one_default_qdisc(). This patch modifies attach_one_default_qdisc() to
> attach the pfifo_fast (pfifo_fast_ops) if the "IFF_FIFO_QUEUE" flag is set.
> 
> [1] https://github.com/systemd/systemd/issues/9194
> 
> Cc: Dave Taht <dave.taht@gmail.com>
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Cc: Cong Wang <xiyou.wangcong@gmail.com>
> Cc: Jiri Pirko <jiri@resnulli.us>
> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
> ---
>  include/linux/netdevice.h | 3 +++
>  net/sched/sch_generic.c   | 3 +++
>  2 files changed, 6 insertions(+)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 166fdc0a78b4..1867e27e3369 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1498,6 +1498,7 @@ struct net_device_ops {
>   * @IFF_FAILOVER: device is a failover master device
>   * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
>   * @IFF_L3MDEV_RX_HANDLER: only invoke the rx handler of L3 master device
> + * @IFF_FIFO_QUEUE: device must run with FIFO qdisc attached. skb drop without NET_XMIT_DROP is fatal

Do you need the FIFO property or only that the qdisc doesn't silently
drop packets? I don't know which other qdiscs are around, but depending
on the answer to this question other than pfifo_fast might be suitable?

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] net: sch_generic: fq_codel vs pfifo_fast
  2019-03-27 18:30 ` [RFC] net: sch_generic: fq_codel vs pfifo_fast Stephen Hemminger
@ 2019-03-27 19:24   ` Marc Kleine-Budde
  0 siblings, 0 replies; 25+ messages in thread
From: Marc Kleine-Budde @ 2019-03-27 19:24 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jiri Pirko, netdev, Dave Taht, Jamal Hadi Salim, kernel,
	Cong Wang, linux-can, davem


[-- Attachment #1.1: Type: text/plain, Size: 1772 bytes --]

On 3/27/19 7:30 PM, Stephen Hemminger wrote:
>> on CAN networking hardware we (the CAN community) experience a lot silent,
>> unwanted frame drops inside the kernel. (See first patch for details.) So
>> here's a patch series to keep pfifo_fast as default scheduler for CAN hardware
>> by default.
> 
> Why do you set fq_codel as default qdisc if you know it doesn't work right for your
> environment. Is this a distro's want one value problem?

This is a many fold problem.

- Consider a random Linux developer attaching one of the mainline
supported USB-CAN adapters to his/her development laptop and doing some
random CAN test. As far a I heard all modern Linux distributions (but
not debian) enable fq_codel per kernel .config or via systemd. The user
will experiences dropped CAN frames on the first 1000 packages. This is
not a good user experience.

- From the user space point of view the sysctl net.core.default_qdisc
behaves a bit strange. Why can I enable a queuing discipline by default
that's known not to work on CAN devices. You might think the kernel
(should) know that CAN devices best work with pfifo_fast (or not
NET_SCHED at all).

- Addressing your question directly: Mixed environment. fq_codel works
great on Ethernet, but terrible on CAN. Of course I an use "tc" to
configure the way I want to. But from my point of view it would be nice,
if the kernel uses sane default...and handles the net.core.default_qdisc
default in a sane way.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] net: sch_generic: add flag IFF_FIFO_QUEUE to use pfifo_fast as default scheduler
  2019-03-27 18:53   ` Uwe Kleine-König
@ 2019-03-27 19:27     ` Marc Kleine-Budde
  0 siblings, 0 replies; 25+ messages in thread
From: Marc Kleine-Budde @ 2019-03-27 19:27 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: netdev, Jiri Pirko, Dave Taht, Jamal Hadi Salim, kernel,
	Cong Wang, linux-can, davem


[-- Attachment #1.1: Type: text/plain, Size: 4253 bytes --]

On 3/27/19 7:53 PM, Uwe Kleine-König wrote:
> On Wed, Mar 27, 2019 at 05:56:31PM +0100, Marc Kleine-Budde wrote:
>> There is networking hardware that isn't based on Ethernet for layers 1 and 2.
>>
>> For example CAN.
>>
>> CAN is a multi-master serial bus standard for connecting Electronic Control
>> Units [ECUs] also known as nodes. A frame on the CAN bus carries up to 8 bytes
>> of payload. Frame corruption is detected by a CRC. However frame loss due to
>> corruption is possible, but a quite unusual phenomenon.
>>
>> While fq_codel works great for TCP/IP, it doesn't for CAN. There are a lot of
>> legacy protocols on top of CAN, which are not build with flow control or high
>> CAN frame drop rates in mind.
>>
>> When using fq_codel, as soon as the queue reaches a certain delay based length,
>> skbs from the head of the queue are silently dropped. Silently meaning that the
>> user space using a send() or similar syscall doesn't get an error. However
>> TCP's flow control algorithm will detect dropped packages and adjust the
> 
> s/package/packet/ here and in a few more locations in this commit log.

Thanks, fixed.

>> bandwidth accordingly.
>>
>> When using fq_codel and sending raw frames over CAN, which is the common use
>> case, the user space thinks the package has been sent without problems, because
>> send() returned without an error. pfifo_fast will drop skbs, if the queue
>> length exceeds the maximum. But with this scheduler the skbs at the tail are
>> dropped, an error (-ENOBUFS) is propagated to user space. So that the user
>> space can slow down the package generation.
>>
>> On distributions, where fq_codel is made default via CONFIG_DEFAULT_NET_SCH
>> during compile time, or set default during runtime with sysctl
>> net.core.default_qdisc (see [1]), we get a bad user experience. In my test case
>> with pfifo_fast, I can transfer thousands of million CAN frames without a frame
>> drop. On the other hand with fq_codel there is more then one lost CAN frame per
>> thousand frames.
>>
>> As pointed out fq_codel is not suited for CAN hardware, so this patch
>> introduces a new netdev_priv_flag called "IFF_FIFO_QUEUE" (in contrast to the
>> existing "IFF_NO_QUEUE").
>>
>> During transition of a netdev from down to up state the default queuing
>> discipline is attached by attach_default_qdiscs() with the help of
>> attach_one_default_qdisc(). This patch modifies attach_one_default_qdisc() to
>> attach the pfifo_fast (pfifo_fast_ops) if the "IFF_FIFO_QUEUE" flag is set.
>>
>> [1] https://github.com/systemd/systemd/issues/9194
>>
>> Cc: Dave Taht <dave.taht@gmail.com>
>> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
>> Cc: Cong Wang <xiyou.wangcong@gmail.com>
>> Cc: Jiri Pirko <jiri@resnulli.us>
>> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
>> ---
>>  include/linux/netdevice.h | 3 +++
>>  net/sched/sch_generic.c   | 3 +++
>>  2 files changed, 6 insertions(+)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 166fdc0a78b4..1867e27e3369 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -1498,6 +1498,7 @@ struct net_device_ops {
>>   * @IFF_FAILOVER: device is a failover master device
>>   * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
>>   * @IFF_L3MDEV_RX_HANDLER: only invoke the rx handler of L3 master device
>> + * @IFF_FIFO_QUEUE: device must run with FIFO qdisc attached. skb drop without NET_XMIT_DROP is fatal
> 
> Do you need the FIFO property or only that the qdisc doesn't silently
> drop packets? I don't know which other qdiscs are around, but depending
> on the answer to this question other than pfifo_fast might be suitable?

No silent dropping is mandatory. No reordering of outgoing packets (if
not configured to do so) is mandatory.

Maybe there are other qdiscs that work on CAN, but pfifo_fast is a sane
default.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] net: sch_generic: add flag IFF_FIFO_QUEUE to use pfifo_fast as default scheduler
  2019-03-27 17:14   ` Cong Wang
@ 2019-03-27 20:11     ` Marc Kleine-Budde
  2019-03-27 20:53       ` Cong Wang
  2019-04-02 17:22       ` Toke Høiland-Jørgensen
  0 siblings, 2 replies; 25+ messages in thread
From: Marc Kleine-Budde @ 2019-03-27 20:11 UTC (permalink / raw)
  To: Cong Wang
  Cc: Jiri Pirko, Linux Kernel Network Developers, Dave Taht,
	Jamal Hadi Salim, kernel, linux-can, David Miller


[-- Attachment #1.1: Type: text/plain, Size: 4190 bytes --]

On 3/27/19 6:14 PM, Cong Wang wrote:
> On Wed, Mar 27, 2019 at 9:56 AM Marc Kleine-Budde <mkl@pengutronix.de> wrote:
>>
>> There is networking hardware that isn't based on Ethernet for layers 1 and 2.
>>
>> For example CAN.
>>
>> CAN is a multi-master serial bus standard for connecting Electronic Control
>> Units [ECUs] also known as nodes. A frame on the CAN bus carries up to 8 bytes
>> of payload. Frame corruption is detected by a CRC. However frame loss due to
>> corruption is possible, but a quite unusual phenomenon.
>>
>> While fq_codel works great for TCP/IP, it doesn't for CAN. There are a lot of
>> legacy protocols on top of CAN, which are not build with flow control or high
>> CAN frame drop rates in mind.
>>
>> When using fq_codel, as soon as the queue reaches a certain delay based length,
>> skbs from the head of the queue are silently dropped. Silently meaning that the
>> user space using a send() or similar syscall doesn't get an error. However
>> TCP's flow control algorithm will detect dropped packages and adjust the
>> bandwidth accordingly.
>>
>> When using fq_codel and sending raw frames over CAN, which is the common use
>> case, the user space thinks the package has been sent without problems, because
>> send() returned without an error. pfifo_fast will drop skbs, if the queue
>> length exceeds the maximum. But with this scheduler the skbs at the tail are
>> dropped, an error (-ENOBUFS) is propagated to user space. So that the user
>> space can slow down the package generation.
>>
>> On distributions, where fq_codel is made default via CONFIG_DEFAULT_NET_SCH
>> during compile time, or set default during runtime with sysctl
>> net.core.default_qdisc (see [1]), we get a bad user experience. In my test case
>> with pfifo_fast, I can transfer thousands of million CAN frames without a frame
>> drop. On the other hand with fq_codel there is more then one lost CAN frame per
>> thousand frames.
>>
>> As pointed out fq_codel is not suited for CAN hardware, so this patch
>> introduces a new netdev_priv_flag called "IFF_FIFO_QUEUE" (in contrast to the
>> existing "IFF_NO_QUEUE").
>>
>> During transition of a netdev from down to up state the default queuing
>> discipline is attached by attach_default_qdiscs() with the help of
>> attach_one_default_qdisc(). This patch modifies attach_one_default_qdisc() to
>> attach the pfifo_fast (pfifo_fast_ops) if the "IFF_FIFO_QUEUE" flag is set.
> 
> I wonder if we just need to allow arbitrary default qdisc per netdevice
> while you are on it. A private flag is simply a boolean, perhaps in the
> future other type of devices wants other default qdiscs, so that could
> make it more flexible.

From my point of view there is networking hardware that use protocols
that work with (i.e. benefit from) fq_codel (hash flow/queue/head drop).

The silent head drop is the most prominent reason why it doesn't work on
CAN. I haven't dug deep enough into the code to see if skb->hash is used
or what the flow dissector will do on CAN frames. So reordering of CAN
frames (if something else than skb->priority is used) might be a
problem, too.

From my point of view, if your networking hardware and the protocols on
top don't like re-ordering or silent head drop, than pfifo_fast is
probably a good default choice.

I discussed the problem a bit at netdev 0x13 and one point someone
mentioned is that if there is a generic set this qdisc function people
might start to add this to network drivers to "optimize" them for their
special workflow or test case.

Doubts aside, how should an arbitrary default qdisc per netdevice look
like? Add a string "default_qdisc" to the netdev? Lookup qdisc by string
during DOWN->UP transition? What do if that qdisc is not compiled into
the kernel? Or rather use an array of qdiscs with one of sch_generic
defaults?

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] net: sch_generic: add flag IFF_FIFO_QUEUE to use pfifo_fast as default scheduler
  2019-03-27 20:11     ` Marc Kleine-Budde
@ 2019-03-27 20:53       ` Cong Wang
  2019-04-02 17:22       ` Toke Høiland-Jørgensen
  1 sibling, 0 replies; 25+ messages in thread
From: Cong Wang @ 2019-03-27 20:53 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: Jiri Pirko, Linux Kernel Network Developers, Dave Taht,
	Jamal Hadi Salim, kernel, linux-can, David Miller

On Wed, Mar 27, 2019 at 1:11 PM Marc Kleine-Budde <mkl@pengutronix.de> wrote:
> Doubts aside, how should an arbitrary default qdisc per netdevice look
> like? Add a string "default_qdisc" to the netdev? Lookup qdisc by string
> during DOWN->UP transition? What do if that qdisc is not compiled into
> the kernel? Or rather use an array of qdiscs with one of sch_generic
> defaults?

I think you can just save a Qdisc_ops pointer in netdevice,
like how we install the default qdisc:

  38 const struct Qdisc_ops *default_qdisc_ops = &pfifo_fast_ops;
  39 EXPORT_SYMBOL(default_qdisc_ops);

And hard-code whatever default into your netdevice init code.

At least for pfifo_fast, you don't need to worry about module
loading. If you really do, you can call qdisc_lookup_default()
and request_module() like what qdisc_set_default() does.
Of course, you can refactor qdisc_set_default() and call it too.

Thanks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/2] net: sch_generic: add flag IFF_FIFO_QUEUE to use pfifo_fast as default scheduler
  2019-03-27 20:11     ` Marc Kleine-Budde
  2019-03-27 20:53       ` Cong Wang
@ 2019-04-02 17:22       ` Toke Høiland-Jørgensen
  1 sibling, 0 replies; 25+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-04-02 17:22 UTC (permalink / raw)
  To: Marc Kleine-Budde, Cong Wang
  Cc: Jiri Pirko, Linux Kernel Network Developers, Dave Taht,
	Jamal Hadi Salim, kernel, linux-can, David Miller

Marc Kleine-Budde <mkl@pengutronix.de> writes:

> On 3/27/19 6:14 PM, Cong Wang wrote:
>> On Wed, Mar 27, 2019 at 9:56 AM Marc Kleine-Budde <mkl@pengutronix.de> wrote:
>>>
>>> There is networking hardware that isn't based on Ethernet for layers 1 and 2.
>>>
>>> For example CAN.
>>>
>>> CAN is a multi-master serial bus standard for connecting Electronic Control
>>> Units [ECUs] also known as nodes. A frame on the CAN bus carries up to 8 bytes
>>> of payload. Frame corruption is detected by a CRC. However frame loss due to
>>> corruption is possible, but a quite unusual phenomenon.
>>>
>>> While fq_codel works great for TCP/IP, it doesn't for CAN. There are a lot of
>>> legacy protocols on top of CAN, which are not build with flow control or high
>>> CAN frame drop rates in mind.
>>>
>>> When using fq_codel, as soon as the queue reaches a certain delay based length,
>>> skbs from the head of the queue are silently dropped. Silently meaning that the
>>> user space using a send() or similar syscall doesn't get an error. However
>>> TCP's flow control algorithm will detect dropped packages and adjust the
>>> bandwidth accordingly.
>>>
>>> When using fq_codel and sending raw frames over CAN, which is the common use
>>> case, the user space thinks the package has been sent without problems, because
>>> send() returned without an error. pfifo_fast will drop skbs, if the queue
>>> length exceeds the maximum. But with this scheduler the skbs at the tail are
>>> dropped, an error (-ENOBUFS) is propagated to user space. So that the user
>>> space can slow down the package generation.
>>>
>>> On distributions, where fq_codel is made default via CONFIG_DEFAULT_NET_SCH
>>> during compile time, or set default during runtime with sysctl
>>> net.core.default_qdisc (see [1]), we get a bad user experience. In my test case
>>> with pfifo_fast, I can transfer thousands of million CAN frames without a frame
>>> drop. On the other hand with fq_codel there is more then one lost CAN frame per
>>> thousand frames.
>>>
>>> As pointed out fq_codel is not suited for CAN hardware, so this patch
>>> introduces a new netdev_priv_flag called "IFF_FIFO_QUEUE" (in contrast to the
>>> existing "IFF_NO_QUEUE").
>>>
>>> During transition of a netdev from down to up state the default queuing
>>> discipline is attached by attach_default_qdiscs() with the help of
>>> attach_one_default_qdisc(). This patch modifies attach_one_default_qdisc() to
>>> attach the pfifo_fast (pfifo_fast_ops) if the "IFF_FIFO_QUEUE" flag is set.
>> 
>> I wonder if we just need to allow arbitrary default qdisc per netdevice
>> while you are on it. A private flag is simply a boolean, perhaps in the
>> future other type of devices wants other default qdiscs, so that could
>> make it more flexible.
>
> From my point of view there is networking hardware that use protocols
> that work with (i.e. benefit from) fq_codel (hash flow/queue/head drop).
>
> The silent head drop is the most prominent reason why it doesn't work on
> CAN. I haven't dug deep enough into the code to see if skb->hash is used
> or what the flow dissector will do on CAN frames. So reordering of CAN
> frames (if something else than skb->priority is used) might be a
> problem, too.
>
> From my point of view, if your networking hardware and the protocols on
> top don't like re-ordering or silent head drop, than pfifo_fast is
> probably a good default choice.
>
> I discussed the problem a bit at netdev 0x13 and one point someone
> mentioned is that if there is a generic set this qdisc function people
> might start to add this to network drivers to "optimize" them for
> their special workflow or test case.

I think I was one of the people you spoke with about this. I agree that
the flag approach makes sense, since I view the requirements of the CAN
protocol as very specifically being met by a FIFO queue.

And yeah I do think we should push back on every device type defining
each own arbitrary qdisc default; having the two very specific
exceptions "no queue" and "FIFO queue" to the general qdisc default
setting makes it explicit that this is for special cases only, and that
any other optimisation of the qdisc configuration should be done in
userspace.

-Toke

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] net: sch_generic: Use pfifo_fast as fallback scheduler for CAN hardware
  2019-03-27 16:56 [RFC] net: sch_generic: fq_codel vs pfifo_fast Marc Kleine-Budde
                   ` (2 preceding siblings ...)
  2019-03-27 18:30 ` [RFC] net: sch_generic: fq_codel vs pfifo_fast Stephen Hemminger
@ 2019-10-22 12:47 ` Vincent Prince
  2019-10-22 12:58   ` Marc Kleine-Budde
  2019-10-22 13:23 ` [PATCH v2] " Vincent Prince
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Vincent Prince @ 2019-10-22 12:47 UTC (permalink / raw)
  To: mkl
  Cc: dave.taht, davem, jhs, jiri, kernel, linux-can, netdev,
	xiyou.wangcong, Vincent Prince

Signed-off-by: Vincent Prince <vincent.prince.fr@gmail.com>
---
 net/sched/sch_generic.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 77b289d..bff43de 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -1008,6 +1008,8 @@ static void attach_one_default_qdisc(struct net_device *dev,
 
 	if (dev->priv_flags & IFF_NO_QUEUE)
 		ops = &noqueue_qdisc_ops;
+        else if(dev->type == ARPHRD_CAN)
+		ops = &pfifo_fast_ops;
 
 	qdisc = qdisc_create_dflt(dev_queue, ops, TC_H_ROOT, NULL);
 	if (!qdisc) {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] net: sch_generic: Use pfifo_fast as fallback scheduler for CAN hardware
  2019-10-22 12:47 ` [PATCH] net: sch_generic: Use pfifo_fast as fallback scheduler for CAN hardware Vincent Prince
@ 2019-10-22 12:58   ` Marc Kleine-Budde
  0 siblings, 0 replies; 25+ messages in thread
From: Marc Kleine-Budde @ 2019-10-22 12:58 UTC (permalink / raw)
  To: Vincent Prince
  Cc: jiri, jhs, netdev, dave.taht, linux-can, kernel, xiyou.wangcong, davem


[-- Attachment #1.1: Type: text/plain, Size: 1095 bytes --]

On 10/22/19 2:47 PM, Vincent Prince wrote:
> Signed-off-by: Vincent Prince <vincent.prince.fr@gmail.com>

Please add the patch description, preferred that one from my original patch.

> ---
>  net/sched/sch_generic.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 77b289d..bff43de 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -1008,6 +1008,8 @@ static void attach_one_default_qdisc(struct net_device *dev,
>  
>  	if (dev->priv_flags & IFF_NO_QUEUE)
>  		ops = &noqueue_qdisc_ops;
> +        else if(dev->type == ARPHRD_CAN)
^^^^^^^^^^^^^^^
use one tab to indent.

> +		ops = &pfifo_fast_ops;
>  
>  	qdisc = qdisc_create_dflt(dev_queue, ops, TC_H_ROOT, NULL);
>  	if (!qdisc) {
> 

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2] net: sch_generic: Use pfifo_fast as fallback scheduler for CAN hardware
  2019-03-27 16:56 [RFC] net: sch_generic: fq_codel vs pfifo_fast Marc Kleine-Budde
                   ` (3 preceding siblings ...)
  2019-10-22 12:47 ` [PATCH] net: sch_generic: Use pfifo_fast as fallback scheduler for CAN hardware Vincent Prince
@ 2019-10-22 13:23 ` Vincent Prince
  2019-10-22 14:53   ` Marc Kleine-Budde
  2019-10-22 15:09 ` [PATCH v3] " Vincent Prince
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Vincent Prince @ 2019-10-22 13:23 UTC (permalink / raw)
  To: mkl
  Cc: dave.taht, davem, jhs, jiri, kernel, linux-can, netdev,
	xiyou.wangcong, Vincent Prince

Signed-off-by: Vincent Prince <vincent.prince.fr@gmail.com>
---
Changes in v2:
 - reformat patch

 net/sched/sch_generic.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 77b289d..dfb2982 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -1008,6 +1008,8 @@ static void attach_one_default_qdisc(struct net_device *dev,
 
 	if (dev->priv_flags & IFF_NO_QUEUE)
 		ops = &noqueue_qdisc_ops;
+	else if(dev->type == ARPHRD_CAN)
+		ops = &pfifo_fast_ops;
 
 	qdisc = qdisc_create_dflt(dev_queue, ops, TC_H_ROOT, NULL);
 	if (!qdisc) {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2] net: sch_generic: Use pfifo_fast as fallback scheduler for CAN hardware
  2019-10-22 13:23 ` [PATCH v2] " Vincent Prince
@ 2019-10-22 14:53   ` Marc Kleine-Budde
  2019-10-22 14:55     ` Marc Kleine-Budde
  2019-10-22 16:42     ` Stephen Hemminger
  0 siblings, 2 replies; 25+ messages in thread
From: Marc Kleine-Budde @ 2019-10-22 14:53 UTC (permalink / raw)
  To: Vincent Prince
  Cc: jiri, jhs, netdev, dave.taht, linux-can, kernel, xiyou.wangcong, davem


[-- Attachment #1.1: Type: text/plain, Size: 2837 bytes --]

On 10/22/19 3:23 PM, Vincent Prince wrote:
> Signed-off-by: Vincent Prince <vincent.prince.fr@gmail.com>

please include a patch description. I.e. this one:

-------->8-------->8-------->8-------->8-------->8-------->8-------->8--------
There is networking hardware that isn't based on Ethernet for layers 1 and 2.

For example CAN.

CAN is a multi-master serial bus standard for connecting Electronic Control
Units [ECUs] also known as nodes. A frame on the CAN bus carries up to 8 bytes
of payload. Frame corruption is detected by a CRC. However frame loss due to
corruption is possible, but a quite unusual phenomenon.

While fq_codel works great for TCP/IP, it doesn't for CAN. There are a lot of
legacy protocols on top of CAN, which are not build with flow control or high
CAN frame drop rates in mind.

When using fq_codel, as soon as the queue reaches a certain delay based length,
skbs from the head of the queue are silently dropped. Silently meaning that the
user space using a send() or similar syscall doesn't get an error. However
TCP's flow control algorithm will detect dropped packages and adjust the
bandwidth accordingly.

When using fq_codel and sending raw frames over CAN, which is the common use
case, the user space thinks the package has been sent without problems, because
send() returned without an error. pfifo_fast will drop skbs, if the queue
length exceeds the maximum. But with this scheduler the skbs at the tail are
dropped, an error (-ENOBUFS) is propagated to user space. So that the user
space can slow down the package generation.

On distributions, where fq_codel is made default via CONFIG_DEFAULT_NET_SCH
during compile time, or set default during runtime with sysctl
net.core.default_qdisc (see [1]), we get a bad user experience. In my test case
with pfifo_fast, I can transfer thousands of million CAN frames without a frame
drop. On the other hand with fq_codel there is more then one lost CAN frame per
thousand frames.

As pointed out fq_codel is not suited for CAN hardware, so this patch changes
attach_one_default_qdisc() to use pfifo_fast for "ARPHRD_CAN" network devices.

During transition of a netdev from down to up state the default queuing
discipline is attached by attach_default_qdiscs() with the help of
attach_one_default_qdisc(). This patch modifies attach_one_default_qdisc() to
attach the pfifo_fast (pfifo_fast_ops) if the network device type is
"ARPHRD_CAN".
-------->8-------->8-------->8-------->8-------->8-------->8-------->8--------

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2] net: sch_generic: Use pfifo_fast as fallback scheduler for CAN hardware
  2019-10-22 14:53   ` Marc Kleine-Budde
@ 2019-10-22 14:55     ` Marc Kleine-Budde
  2019-10-22 16:42     ` Stephen Hemminger
  1 sibling, 0 replies; 25+ messages in thread
From: Marc Kleine-Budde @ 2019-10-22 14:55 UTC (permalink / raw)
  To: Vincent Prince
  Cc: jiri, dave.taht, netdev, jhs, linux-can, kernel, xiyou.wangcong, davem


[-- Attachment #1.1: Type: text/plain, Size: 3228 bytes --]

On 10/22/19 4:53 PM, Marc Kleine-Budde wrote:
> On 10/22/19 3:23 PM, Vincent Prince wrote:
>> Signed-off-by: Vincent Prince <vincent.prince.fr@gmail.com>
> 
> please include a patch description. I.e. this one:
> 
> -------->8-------->8-------->8-------->8-------->8-------->8-------->8--------
> There is networking hardware that isn't based on Ethernet for layers 1 and 2.
> 
> For example CAN.
> 
> CAN is a multi-master serial bus standard for connecting Electronic Control
> Units [ECUs] also known as nodes. A frame on the CAN bus carries up to 8 bytes
> of payload. Frame corruption is detected by a CRC. However frame loss due to
> corruption is possible, but a quite unusual phenomenon.
> 
> While fq_codel works great for TCP/IP, it doesn't for CAN. There are a lot of
> legacy protocols on top of CAN, which are not build with flow control or high
> CAN frame drop rates in mind.
> 
> When using fq_codel, as soon as the queue reaches a certain delay based length,
> skbs from the head of the queue are silently dropped. Silently meaning that the
> user space using a send() or similar syscall doesn't get an error. However
> TCP's flow control algorithm will detect dropped packages and adjust the
> bandwidth accordingly.
> 
> When using fq_codel and sending raw frames over CAN, which is the common use
> case, the user space thinks the package has been sent without problems, because
> send() returned without an error. pfifo_fast will drop skbs, if the queue
> length exceeds the maximum. But with this scheduler the skbs at the tail are
> dropped, an error (-ENOBUFS) is propagated to user space. So that the user
> space can slow down the package generation.
> 
> On distributions, where fq_codel is made default via CONFIG_DEFAULT_NET_SCH
> during compile time, or set default during runtime with sysctl
> net.core.default_qdisc (see [1]), we get a bad user experience. In my test case
> with pfifo_fast, I can transfer thousands of million CAN frames without a frame
> drop. On the other hand with fq_codel there is more then one lost CAN frame per
> thousand frames.
> 
> As pointed out fq_codel is not suited for CAN hardware, so this patch changes
> attach_one_default_qdisc() to use pfifo_fast for "ARPHRD_CAN" network devices.
> 
> During transition of a netdev from down to up state the default queuing
> discipline is attached by attach_default_qdiscs() with the help of
> attach_one_default_qdisc(). This patch modifies attach_one_default_qdisc() to
> attach the pfifo_fast (pfifo_fast_ops) if the network device type is
> "ARPHRD_CAN".
> -------->8-------->8-------->8-------->8-------->8-------->8-------->8--------

Doh, also include the footnote:

-------->8-------->8-------->8-------->8-------->8-------->8-------->8--------
[1] https://github.com/systemd/systemd/issues/9194
-------->8-------->8-------->8-------->8-------->8-------->8-------->8--------

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3] net: sch_generic: Use pfifo_fast as fallback scheduler for CAN hardware
  2019-03-27 16:56 [RFC] net: sch_generic: fq_codel vs pfifo_fast Marc Kleine-Budde
                   ` (4 preceding siblings ...)
  2019-10-22 13:23 ` [PATCH v2] " Vincent Prince
@ 2019-10-22 15:09 ` Vincent Prince
  2019-10-23 10:52 ` [PATCH v4] " Vincent Prince
  2019-10-23 13:44 ` [PATCH v5] " Vincent Prince
  7 siblings, 0 replies; 25+ messages in thread
From: Vincent Prince @ 2019-10-22 15:09 UTC (permalink / raw)
  To: mkl
  Cc: dave.taht, davem, jhs, jiri, kernel, linux-can, netdev,
	xiyou.wangcong, Vincent Prince

There is networking hardware that isn't based on Ethernet for layers 1 and 2.

For example CAN.

CAN is a multi-master serial bus standard for connecting Electronic Control
Units [ECUs] also known as nodes. A frame on the CAN bus carries up to 8 bytes
of payload. Frame corruption is detected by a CRC. However frame loss due to
corruption is possible, but a quite unusual phenomenon.

While fq_codel works great for TCP/IP, it doesn't for CAN. There are a lot of
legacy protocols on top of CAN, which are not build with flow control or high
CAN frame drop rates in mind.

When using fq_codel, as soon as the queue reaches a certain delay based length,
skbs from the head of the queue are silently dropped. Silently meaning that the
user space using a send() or similar syscall doesn't get an error. However
TCP's flow control algorithm will detect dropped packages and adjust the
bandwidth accordingly.

When using fq_codel and sending raw frames over CAN, which is the common use
case, the user space thinks the package has been sent without problems, because
send() returned without an error. pfifo_fast will drop skbs, if the queue
length exceeds the maximum. But with this scheduler the skbs at the tail are
dropped, an error (-ENOBUFS) is propagated to user space. So that the user
space can slow down the package generation.

On distributions, where fq_codel is made default via CONFIG_DEFAULT_NET_SCH
during compile time, or set default during runtime with sysctl
net.core.default_qdisc (see [1]), we get a bad user experience. In my test case
with pfifo_fast, I can transfer thousands of million CAN frames without a frame
drop. On the other hand with fq_codel there is more then one lost CAN frame per
thousand frames.

As pointed out fq_codel is not suited for CAN hardware, so this patch changes
attach_one_default_qdisc() to use pfifo_fast for "ARPHRD_CAN" network devices.

During transition of a netdev from down to up state the default queuing
discipline is attached by attach_default_qdiscs() with the help of
attach_one_default_qdisc(). This patch modifies attach_one_default_qdisc() to
attach the pfifo_fast (pfifo_fast_ops) if the network device type is
"ARPHRD_CAN".

[1] https://github.com/systemd/systemd/issues/9194

Signed-off-by: Vincent Prince <vincent.prince.fr@gmail.com>
---
Changes in v3:
 - add description

Changes in v2:
 - reformat patch

 net/sched/sch_generic.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 77b289d..dfb2982 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -1008,6 +1008,8 @@ static void attach_one_default_qdisc(struct net_device *dev,
 
 	if (dev->priv_flags & IFF_NO_QUEUE)
 		ops = &noqueue_qdisc_ops;
+	else if(dev->type == ARPHRD_CAN)
+		ops = &pfifo_fast_ops;
 
 	qdisc = qdisc_create_dflt(dev_queue, ops, TC_H_ROOT, NULL);
 	if (!qdisc) {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2] net: sch_generic: Use pfifo_fast as fallback scheduler for CAN hardware
  2019-10-22 14:53   ` Marc Kleine-Budde
  2019-10-22 14:55     ` Marc Kleine-Budde
@ 2019-10-22 16:42     ` Stephen Hemminger
  2019-10-22 16:48       ` Marc Kleine-Budde
  2019-10-22 17:28       ` Eric Dumazet
  1 sibling, 2 replies; 25+ messages in thread
From: Stephen Hemminger @ 2019-10-22 16:42 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: Vincent Prince, jiri, jhs, netdev, dave.taht, linux-can, kernel,
	xiyou.wangcong, davem

[-- Attachment #1: Type: text/plain, Size: 2821 bytes --]

On Tue, 22 Oct 2019 16:53:44 +0200
Marc Kleine-Budde <mkl@pengutronix.de> wrote:

> On 10/22/19 3:23 PM, Vincent Prince wrote:
> > Signed-off-by: Vincent Prince <vincent.prince.fr@gmail.com>  
> 
> please include a patch description. I.e. this one:
> 
> -------->8-------->8-------->8-------->8-------->8-------->8-------->8--------  
> There is networking hardware that isn't based on Ethernet for layers 1 and 2.
> 
> For example CAN.
> 
> CAN is a multi-master serial bus standard for connecting Electronic Control
> Units [ECUs] also known as nodes. A frame on the CAN bus carries up to 8 bytes
> of payload. Frame corruption is detected by a CRC. However frame loss due to
> corruption is possible, but a quite unusual phenomenon.
> 
> While fq_codel works great for TCP/IP, it doesn't for CAN. There are a lot of
> legacy protocols on top of CAN, which are not build with flow control or high
> CAN frame drop rates in mind.
> 
> When using fq_codel, as soon as the queue reaches a certain delay based length,
> skbs from the head of the queue are silently dropped. Silently meaning that the
> user space using a send() or similar syscall doesn't get an error. However
> TCP's flow control algorithm will detect dropped packages and adjust the
> bandwidth accordingly.
> 
> When using fq_codel and sending raw frames over CAN, which is the common use
> case, the user space thinks the package has been sent without problems, because
> send() returned without an error. pfifo_fast will drop skbs, if the queue
> length exceeds the maximum. But with this scheduler the skbs at the tail are
> dropped, an error (-ENOBUFS) is propagated to user space. So that the user
> space can slow down the package generation.
> 
> On distributions, where fq_codel is made default via CONFIG_DEFAULT_NET_SCH
> during compile time, or set default during runtime with sysctl
> net.core.default_qdisc (see [1]), we get a bad user experience. In my test case
> with pfifo_fast, I can transfer thousands of million CAN frames without a frame
> drop. On the other hand with fq_codel there is more then one lost CAN frame per
> thousand frames.
> 
> As pointed out fq_codel is not suited for CAN hardware, so this patch changes
> attach_one_default_qdisc() to use pfifo_fast for "ARPHRD_CAN" network devices.
> 
> During transition of a netdev from down to up state the default queuing
> discipline is attached by attach_default_qdiscs() with the help of
> attach_one_default_qdisc(). This patch modifies attach_one_default_qdisc() to
> attach the pfifo_fast (pfifo_fast_ops) if the network device type is
> "ARPHRD_CAN".
> -------->8-------->8-------->8-------->8-------->8-------->8-------->8--------  
> 
> Marc
> 

Why not fix fq_codel to return the same errors as other qdisc?

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2] net: sch_generic: Use pfifo_fast as fallback scheduler for CAN hardware
  2019-10-22 16:42     ` Stephen Hemminger
@ 2019-10-22 16:48       ` Marc Kleine-Budde
  2019-10-22 17:28       ` Eric Dumazet
  1 sibling, 0 replies; 25+ messages in thread
From: Marc Kleine-Budde @ 2019-10-22 16:48 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Vincent Prince, jiri, jhs, netdev, dave.taht, linux-can, kernel,
	xiyou.wangcong, davem


[-- Attachment #1.1: Type: text/plain, Size: 2010 bytes --]

On 10/22/19 6:42 PM, Stephen Hemminger wrote:
> On Tue, 22 Oct 2019 16:53:44 +0200
> Marc Kleine-Budde <mkl@pengutronix.de> wrote:
> 
>> On 10/22/19 3:23 PM, Vincent Prince wrote:
>>> Signed-off-by: Vincent Prince <vincent.prince.fr@gmail.com>  
>>
>> please include a patch description. I.e. this one:
>>
>> -------->8-------->8-------->8-------->8-------->8-------->8-------->8--------  
>> There is networking hardware that isn't based on Ethernet for layers 1 and 2.
>>
>> For example CAN.
>>
>> CAN is a multi-master serial bus standard for connecting Electronic Control
>> Units [ECUs] also known as nodes. A frame on the CAN bus carries up to 8 bytes
>> of payload. Frame corruption is detected by a CRC. However frame loss due to
>> corruption is possible, but a quite unusual phenomenon.
>>
>> While fq_codel works great for TCP/IP, it doesn't for CAN. There are a lot of
>> legacy protocols on top of CAN, which are not build with flow control or high
>> CAN frame drop rates in mind.
>>
>> When using fq_codel, as soon as the queue reaches a certain delay based length,
>> skbs from the head of the queue are silently dropped. Silently meaning that the
                 ^^^^^^^^^^^^^^^^^
>> user space using a send() or similar syscall doesn't get an error. However
>> TCP's flow control algorithm will detect dropped packages and adjust the
>> bandwidth accordingly.

> Why not fix fq_codel to return the same errors as other qdisc?

The head drop is the problem. After a send() system call returned to
user space, one would not expect that a later send() will knock an
earlier from the queue.

It's too late to throttle the package generation, as one frame is lost
already.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2] net: sch_generic: Use pfifo_fast as fallback scheduler for CAN hardware
  2019-10-22 16:42     ` Stephen Hemminger
  2019-10-22 16:48       ` Marc Kleine-Budde
@ 2019-10-22 17:28       ` Eric Dumazet
  2019-10-22 18:21         ` Oliver Hartkopp
  1 sibling, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2019-10-22 17:28 UTC (permalink / raw)
  To: Stephen Hemminger, Marc Kleine-Budde
  Cc: Vincent Prince, jiri, jhs, netdev, dave.taht, linux-can, kernel,
	xiyou.wangcong, davem



On 10/22/19 9:42 AM, Stephen Hemminger wrote:

> Why not fix fq_codel to return the same errors as other qdisc?
> 

I believe the same problem would happen with any qdisc not doing tail drops.

Do we really want to enforce all qdisc to have a common drop strategy ?

For example, FQ could implement a strategy dropping the oldest packet in the queue,
which is absolutely not related to the enqueue order.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2] net: sch_generic: Use pfifo_fast as fallback scheduler for CAN hardware
  2019-10-22 17:28       ` Eric Dumazet
@ 2019-10-22 18:21         ` Oliver Hartkopp
  0 siblings, 0 replies; 25+ messages in thread
From: Oliver Hartkopp @ 2019-10-22 18:21 UTC (permalink / raw)
  To: Eric Dumazet, Stephen Hemminger, Marc Kleine-Budde
  Cc: Vincent Prince, jiri, jhs, netdev, dave.taht, linux-can, kernel,
	xiyou.wangcong, davem


On 22/10/2019 19.28, Eric Dumazet wrote
> On 10/22/19 9:42 AM, Stephen Hemminger wrote:
> 
>> Why not fix fq_codel to return the same errors as other qdisc?
>>
> 
> I believe the same problem would happen with any qdisc not doing tail drops.
> 
> Do we really want to enforce all qdisc to have a common drop strategy ?

CAN has no drop strategy. There is no transport protocol at this point 
which can handle and compensate drops. CAN is just about PDUs that are 
sent on a special medium.

And that's what this patch was about.

> 
> For example, FQ could implement a strategy dropping the oldest packet in the queue,
> which is absolutely not related to the enqueue order.
> 

We have a CAN ID related ematch rule in em_canid.c to be able to 
configure HTBs, delays or drops. But this is for testing and special 
use-cases. Silently dropped frames are not expected.

Regards,
Oliver




^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v4] net: sch_generic: Use pfifo_fast as fallback scheduler for CAN hardware
  2019-03-27 16:56 [RFC] net: sch_generic: fq_codel vs pfifo_fast Marc Kleine-Budde
                   ` (5 preceding siblings ...)
  2019-10-22 15:09 ` [PATCH v3] " Vincent Prince
@ 2019-10-23 10:52 ` Vincent Prince
  2019-10-23 11:13   ` Dave Taht
  2019-10-23 13:44 ` [PATCH v5] " Vincent Prince
  7 siblings, 1 reply; 25+ messages in thread
From: Vincent Prince @ 2019-10-23 10:52 UTC (permalink / raw)
  To: mkl
  Cc: dave.taht, davem, jhs, jiri, kernel, linux-can, netdev,
	xiyou.wangcong, Vincent Prince

There is networking hardware that isn't based on Ethernet for layers 1 and 2.

For example CAN.

CAN is a multi-master serial bus standard for connecting Electronic Control
Units [ECUs] also known as nodes. A frame on the CAN bus carries up to 8 bytes
of payload. Frame corruption is detected by a CRC. However frame loss due to
corruption is possible, but a quite unusual phenomenon.

While fq_codel works great for TCP/IP, it doesn't for CAN. There are a lot of
legacy protocols on top of CAN, which are not build with flow control or high
CAN frame drop rates in mind.

When using fq_codel, as soon as the queue reaches a certain delay based length,
skbs from the head of the queue are silently dropped. Silently meaning that the
user space using a send() or similar syscall doesn't get an error. However
TCP's flow control algorithm will detect dropped packages and adjust the
bandwidth accordingly.

When using fq_codel and sending raw frames over CAN, which is the common use
case, the user space thinks the package has been sent without problems, because
send() returned without an error. pfifo_fast will drop skbs, if the queue
length exceeds the maximum. But with this scheduler the skbs at the tail are
dropped, an error (-ENOBUFS) is propagated to user space. So that the user
space can slow down the package generation.

On distributions, where fq_codel is made default via CONFIG_DEFAULT_NET_SCH
during compile time, or set default during runtime with sysctl
net.core.default_qdisc (see [1]), we get a bad user experience. In my test case
with pfifo_fast, I can transfer thousands of million CAN frames without a frame
drop. On the other hand with fq_codel there is more then one lost CAN frame per
thousand frames.

As pointed out fq_codel is not suited for CAN hardware, so this patch changes
attach_one_default_qdisc() to use pfifo_fast for "ARPHRD_CAN" network devices.

During transition of a netdev from down to up state the default queuing
discipline is attached by attach_default_qdiscs() with the help of
attach_one_default_qdisc(). This patch modifies attach_one_default_qdisc() to
attach the pfifo_fast (pfifo_fast_ops) if the network device type is
"ARPHRD_CAN".

[1] https://github.com/systemd/systemd/issues/9194

Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Vincent Prince <vincent.prince.fr@gmail.com>
---
Changes in v4: 
 - add Marc credit to commit log
 
Changes in v3:
 - add description

Changes in v2:
 - reformat patch

 net/sched/sch_generic.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 77b289d..dfb2982 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -1008,6 +1008,8 @@ static void attach_one_default_qdisc(struct net_device *dev,
 
 	if (dev->priv_flags & IFF_NO_QUEUE)
 		ops = &noqueue_qdisc_ops;
+	else if(dev->type == ARPHRD_CAN)
+		ops = &pfifo_fast_ops;
 
 	qdisc = qdisc_create_dflt(dev_queue, ops, TC_H_ROOT, NULL);
 	if (!qdisc) {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v4] net: sch_generic: Use pfifo_fast as fallback scheduler for CAN hardware
  2019-10-23 10:52 ` [PATCH v4] " Vincent Prince
@ 2019-10-23 11:13   ` Dave Taht
  0 siblings, 0 replies; 25+ messages in thread
From: Dave Taht @ 2019-10-23 11:13 UTC (permalink / raw)
  To: Vincent Prince
  Cc: Marc Kleine-Budde, David S. Miller, Jamal Hadi Salim,
	Jiří Pírko, kernel, linux-can,
	Linux Kernel Network Developers, Cong Wang

On Wed, Oct 23, 2019 at 3:52 AM Vincent Prince
<vincent.prince.fr@gmail.com> wrote:
>
> There is networking hardware that isn't based on Ethernet for layers 1 and 2.
>
> For example CAN.
>
> CAN is a multi-master serial bus standard for connecting Electronic Control
> Units [ECUs] also known as nodes. A frame on the CAN bus carries up to 8 bytes
> of payload. Frame corruption is detected by a CRC. However frame loss due to
> corruption is possible, but a quite unusual phenomenon.
>
> While fq_codel works great for TCP/IP, it doesn't for CAN. There are a lot of
> legacy protocols on top of CAN, which are not build with flow control or high
> CAN frame drop rates in mind.
>
> When using fq_codel, as soon as the queue reaches a certain delay based length,
> skbs from the head of the queue are silently dropped. Silently meaning that the
> user space using a send() or similar syscall doesn't get an error. However
> TCP's flow control algorithm will detect dropped packages and adjust the
> bandwidth accordingly.
>
> When using fq_codel and sending raw frames over CAN, which is the common use
> case, the user space thinks the package has been sent without problems, because
> send() returned without an error. pfifo_fast will drop skbs, if the queue
> length exceeds the maximum. But with this scheduler the skbs at the tail are
> dropped, an error (-ENOBUFS) is propagated to user space. So that the user
> space can slow down the package generation.
>
> On distributions, where fq_codel is made default via CONFIG_DEFAULT_NET_SCH
> during compile time, or set default during runtime with sysctl
> net.core.default_qdisc (see [1]), we get a bad user experience. In my test case
> with pfifo_fast, I can transfer thousands of million CAN frames without a frame
> drop. On the other hand with fq_codel there is more then one lost CAN frame per
> thousand frames.
>
> As pointed out fq_codel is not suited for CAN hardware, so this patch changes
> attach_one_default_qdisc() to use pfifo_fast for "ARPHRD_CAN" network devices.
>
> During transition of a netdev from down to up state the default queuing
> discipline is attached by attach_default_qdiscs() with the help of
> attach_one_default_qdisc(). This patch modifies attach_one_default_qdisc() to
> attach the pfifo_fast (pfifo_fast_ops) if the network device type is
> "ARPHRD_CAN".
>
> [1] https://github.com/systemd/systemd/issues/9194
>
> Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de>
> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
> Signed-off-by: Vincent Prince <vincent.prince.fr@gmail.com>
> ---
> Changes in v4:
>  - add Marc credit to commit log
>
> Changes in v3:
>  - add description
>
> Changes in v2:
>  - reformat patch
>
>  net/sched/sch_generic.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 77b289d..dfb2982 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -1008,6 +1008,8 @@ static void attach_one_default_qdisc(struct net_device *dev,
>
>         if (dev->priv_flags & IFF_NO_QUEUE)
>                 ops = &noqueue_qdisc_ops;
> +       else if(dev->type == ARPHRD_CAN)
> +               ops = &pfifo_fast_ops;
>
>         qdisc = qdisc_create_dflt(dev_queue, ops, TC_H_ROOT, NULL);
>         if (!qdisc) {
> --
> 2.7.4
>

While I'm delighted to see such a simple patch emerge, openwrt long
ago patched out pfifo_fast. pfifo_fast has
additional semantics not needed in the can use case either (I think)
and "pfifo" is fine, but sure, pfifo_fast if you must.

anyway, regardless, that's an easy fix and I hope this fix goes to
stable, as I've had nightmares about cars exploding due to out of
order can bus operations ever since I learned of this bug.

Acked-by: Dave Taht <dave.taht@gmail.com>

-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v5] net: sch_generic: Use pfifo_fast as fallback scheduler for CAN hardware
  2019-03-27 16:56 [RFC] net: sch_generic: fq_codel vs pfifo_fast Marc Kleine-Budde
                   ` (6 preceding siblings ...)
  2019-10-23 10:52 ` [PATCH v4] " Vincent Prince
@ 2019-10-23 13:44 ` Vincent Prince
  2019-10-26  2:20   ` David Miller
  7 siblings, 1 reply; 25+ messages in thread
From: Vincent Prince @ 2019-10-23 13:44 UTC (permalink / raw)
  To: mkl
  Cc: dave.taht, davem, jhs, jiri, kernel, linux-can, netdev,
	xiyou.wangcong, Vincent Prince

There is networking hardware that isn't based on Ethernet for layers 1 and 2.

For example CAN.

CAN is a multi-master serial bus standard for connecting Electronic Control
Units [ECUs] also known as nodes. A frame on the CAN bus carries up to 8 bytes
of payload. Frame corruption is detected by a CRC. However frame loss due to
corruption is possible, but a quite unusual phenomenon.

While fq_codel works great for TCP/IP, it doesn't for CAN. There are a lot of
legacy protocols on top of CAN, which are not build with flow control or high
CAN frame drop rates in mind.

When using fq_codel, as soon as the queue reaches a certain delay based length,
skbs from the head of the queue are silently dropped. Silently meaning that the
user space using a send() or similar syscall doesn't get an error. However
TCP's flow control algorithm will detect dropped packages and adjust the
bandwidth accordingly.

When using fq_codel and sending raw frames over CAN, which is the common use
case, the user space thinks the package has been sent without problems, because
send() returned without an error. pfifo_fast will drop skbs, if the queue
length exceeds the maximum. But with this scheduler the skbs at the tail are
dropped, an error (-ENOBUFS) is propagated to user space. So that the user
space can slow down the package generation.

On distributions, where fq_codel is made default via CONFIG_DEFAULT_NET_SCH
during compile time, or set default during runtime with sysctl
net.core.default_qdisc (see [1]), we get a bad user experience. In my test case
with pfifo_fast, I can transfer thousands of million CAN frames without a frame
drop. On the other hand with fq_codel there is more then one lost CAN frame per
thousand frames.

As pointed out fq_codel is not suited for CAN hardware, so this patch changes
attach_one_default_qdisc() to use pfifo_fast for "ARPHRD_CAN" network devices.

During transition of a netdev from down to up state the default queuing
discipline is attached by attach_default_qdiscs() with the help of
attach_one_default_qdisc(). This patch modifies attach_one_default_qdisc() to
attach the pfifo_fast (pfifo_fast_ops) if the network device type is
"ARPHRD_CAN".

[1] https://github.com/systemd/systemd/issues/9194

Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Vincent Prince <vincent.prince.fr@gmail.com>
Acked-by: Dave Taht <dave.taht@gmail.com>
---
Changes in v5:
 - add previous ack

Changes in v4: 
 - add Marc credit to commit log
 
Changes in v3:
 - add description

Changes in v2:
 - reformat patch

 net/sched/sch_generic.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 77b289d..dfb2982 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -1008,6 +1008,8 @@ static void attach_one_default_qdisc(struct net_device *dev,
 
 	if (dev->priv_flags & IFF_NO_QUEUE)
 		ops = &noqueue_qdisc_ops;
+	else if(dev->type == ARPHRD_CAN)
+		ops = &pfifo_fast_ops;
 
 	qdisc = qdisc_create_dflt(dev_queue, ops, TC_H_ROOT, NULL);
 	if (!qdisc) {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v5] net: sch_generic: Use pfifo_fast as fallback scheduler for CAN hardware
  2019-10-23 13:44 ` [PATCH v5] " Vincent Prince
@ 2019-10-26  2:20   ` David Miller
  0 siblings, 0 replies; 25+ messages in thread
From: David Miller @ 2019-10-26  2:20 UTC (permalink / raw)
  To: vincent.prince.fr
  Cc: mkl, dave.taht, jhs, jiri, kernel, linux-can, netdev, xiyou.wangcong

From: Vincent Prince <vincent.prince.fr@gmail.com>
Date: Wed, 23 Oct 2019 15:44:20 +0200

> There is networking hardware that isn't based on Ethernet for layers 1 and 2.
> 
> For example CAN.
> 
> CAN is a multi-master serial bus standard for connecting Electronic Control
> Units [ECUs] also known as nodes. A frame on the CAN bus carries up to 8 bytes
> of payload. Frame corruption is detected by a CRC. However frame loss due to
> corruption is possible, but a quite unusual phenomenon.
> 
> While fq_codel works great for TCP/IP, it doesn't for CAN. There are a lot of
> legacy protocols on top of CAN, which are not build with flow control or high
> CAN frame drop rates in mind.
> 
> When using fq_codel, as soon as the queue reaches a certain delay based length,
> skbs from the head of the queue are silently dropped. Silently meaning that the
> user space using a send() or similar syscall doesn't get an error. However
> TCP's flow control algorithm will detect dropped packages and adjust the
> bandwidth accordingly.
> 
> When using fq_codel and sending raw frames over CAN, which is the common use
> case, the user space thinks the package has been sent without problems, because
> send() returned without an error. pfifo_fast will drop skbs, if the queue
> length exceeds the maximum. But with this scheduler the skbs at the tail are
> dropped, an error (-ENOBUFS) is propagated to user space. So that the user
> space can slow down the package generation.
> 
> On distributions, where fq_codel is made default via CONFIG_DEFAULT_NET_SCH
> during compile time, or set default during runtime with sysctl
> net.core.default_qdisc (see [1]), we get a bad user experience. In my test case
> with pfifo_fast, I can transfer thousands of million CAN frames without a frame
> drop. On the other hand with fq_codel there is more then one lost CAN frame per
> thousand frames.
> 
> As pointed out fq_codel is not suited for CAN hardware, so this patch changes
> attach_one_default_qdisc() to use pfifo_fast for "ARPHRD_CAN" network devices.
> 
> During transition of a netdev from down to up state the default queuing
> discipline is attached by attach_default_qdiscs() with the help of
> attach_one_default_qdisc(). This patch modifies attach_one_default_qdisc() to
> attach the pfifo_fast (pfifo_fast_ops) if the network device type is
> "ARPHRD_CAN".
> 
> [1] https://github.com/systemd/systemd/issues/9194
> 
> Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de>
> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
> Signed-off-by: Vincent Prince <vincent.prince.fr@gmail.com>
> Acked-by: Dave Taht <dave.taht@gmail.com>

Applied to net-next.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2019-10-26  2:20 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-27 16:56 [RFC] net: sch_generic: fq_codel vs pfifo_fast Marc Kleine-Budde
2019-03-27 16:56 ` [PATCH 1/2] net: sch_generic: add flag IFF_FIFO_QUEUE to use pfifo_fast as default scheduler Marc Kleine-Budde
2019-03-27 17:14   ` Cong Wang
2019-03-27 20:11     ` Marc Kleine-Budde
2019-03-27 20:53       ` Cong Wang
2019-04-02 17:22       ` Toke Høiland-Jørgensen
2019-03-27 18:53   ` Uwe Kleine-König
2019-03-27 19:27     ` Marc Kleine-Budde
2019-03-27 16:56 ` [PATCH 2/2] can: dev: let all CAN devices " Marc Kleine-Budde
2019-03-27 18:30 ` [RFC] net: sch_generic: fq_codel vs pfifo_fast Stephen Hemminger
2019-03-27 19:24   ` Marc Kleine-Budde
2019-10-22 12:47 ` [PATCH] net: sch_generic: Use pfifo_fast as fallback scheduler for CAN hardware Vincent Prince
2019-10-22 12:58   ` Marc Kleine-Budde
2019-10-22 13:23 ` [PATCH v2] " Vincent Prince
2019-10-22 14:53   ` Marc Kleine-Budde
2019-10-22 14:55     ` Marc Kleine-Budde
2019-10-22 16:42     ` Stephen Hemminger
2019-10-22 16:48       ` Marc Kleine-Budde
2019-10-22 17:28       ` Eric Dumazet
2019-10-22 18:21         ` Oliver Hartkopp
2019-10-22 15:09 ` [PATCH v3] " Vincent Prince
2019-10-23 10:52 ` [PATCH v4] " Vincent Prince
2019-10-23 11:13   ` Dave Taht
2019-10-23 13:44 ` [PATCH v5] " Vincent Prince
2019-10-26  2:20   ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).