All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next] rfs: added /proc/sys/net/core/rps_allow_ooo flag to tweak flow alg
@ 2022-06-24 16:54 James Yonan
  2022-06-24 17:05 ` Stephen Hemminger
  0 siblings, 1 reply; 8+ messages in thread
From: James Yonan @ 2022-06-24 16:54 UTC (permalink / raw)
  To: netdev; +Cc: therbert, James Yonan

rps_allow_ooo (0|1, default=0) -- if set to 1, allow RFS
(receive flow steering) to move a flow to a new CPU even
if the old CPU queue has pending packets.  Note that this
can result in packets being delivered out-of-order.  If set
to 0 (the default), the previous behavior is retained, where
flows will not be moved as long as pending packets remain.

The motivation for this patch is that while it's good to
prevent out-of-order packets, the current RFS logic requires
that all previous packets for the flow have been dequeued
before an RFS CPU switch is made, so as to preserve in-order
delivery.  In some cases, on links with heavy VPN traffic,
we have observed that this requirement is too onerous, and
that it prevents an RFS CPU switch from occurring within a
reasonable time frame if heavy traffic causes the old CPU
queue to never fully drain.

So rps_allow_ooo allows the user to select the tradeoff
between a more aggressive RFS steering policy that may
reorder packets on a CPU switch event (rps_allow_ooo=1)
vs. one that prioritizes in-order delivery (rps_allow_ooo=0).

Signed-off-by: James Yonan <james@openvpn.net>
---
 Documentation/networking/scaling.rst | 12 ++++++++++++
 include/linux/netdevice.h            |  1 +
 net/core/dev.c                       |  5 ++++-
 net/core/sysctl_net_core.c           |  9 +++++++++
 4 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/scaling.rst b/Documentation/networking/scaling.rst
index 3d435caa3ef2..371b9aaddf81 100644
--- a/Documentation/networking/scaling.rst
+++ b/Documentation/networking/scaling.rst
@@ -313,6 +313,18 @@ there are no packets outstanding on the old CPU, as the outstanding
 packets could arrive later than those about to be processed on the new
 CPU.
 
+However, in some cases it may be desirable to relax the requirement
+that a flow only moves to a new CPU when there are no packets
+outstanding on the old CPU.  For this, we introduce::
+
+  /proc/sys/net/core/rps_allow_ooo (0|1, default=0)
+
+If set to 1, allow RFS to move a flow to a new CPU even if the old CPU
+queue has pending packets.  If set to 0 (the default), flows will not be
+moved as long as pending packets remain.  So rps_allow_ooo allows the
+user to select the tradeoff between a more aggressive steering algorithm
+that can potentially reorder packets (rps_allow_ooo=1) vs. one that
+prioritizes in-order delivery (rps_allow_ooo=0).
 
 RFS Configuration
 -----------------
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 89afa4f7747d..556efe223c17 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -715,6 +715,7 @@ struct rps_sock_flow_table {
 
 #define RPS_NO_CPU 0xffff
 
+extern int rps_allow_ooo_sysctl;
 extern u32 rps_cpu_mask;
 extern struct rps_sock_flow_table __rcu *rps_sock_flow_table;
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 978ed0622d8f..621b2cc311a6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4379,6 +4379,8 @@ struct rps_sock_flow_table __rcu *rps_sock_flow_table __read_mostly;
 EXPORT_SYMBOL(rps_sock_flow_table);
 u32 rps_cpu_mask __read_mostly;
 EXPORT_SYMBOL(rps_cpu_mask);
+int rps_allow_ooo_sysctl __read_mostly;
+EXPORT_SYMBOL(rps_allow_ooo_sysctl);
 
 struct static_key_false rps_needed __read_mostly;
 EXPORT_SYMBOL(rps_needed);
@@ -4494,6 +4496,7 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 		 * If the desired CPU (where last recvmsg was done) is
 		 * different from current CPU (one in the rx-queue flow
 		 * table entry), switch if one of the following holds:
+		 *   - rps_allow_ooo_sysctl is enabled.
 		 *   - Current CPU is unset (>= nr_cpu_ids).
 		 *   - Current CPU is offline.
 		 *   - The current CPU's queue tail has advanced beyond the
@@ -4502,7 +4505,7 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 		 *     have been dequeued, thus preserving in order delivery.
 		 */
 		if (unlikely(tcpu != next_cpu) &&
-		    (tcpu >= nr_cpu_ids || !cpu_online(tcpu) ||
+		    (rps_allow_ooo_sysctl || tcpu >= nr_cpu_ids || !cpu_online(tcpu) ||
 		     ((int)(per_cpu(softnet_data, tcpu).input_queue_head -
 		      rflow->last_qtail)) >= 0)) {
 			tcpu = next_cpu;
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 71a13596ea2b..4fd9e61c2adf 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -470,6 +470,15 @@ static struct ctl_table net_core_table[] = {
 		.mode		= 0644,
 		.proc_handler	= rps_sock_flow_sysctl
 	},
+	{
+		.procname	= "rps_allow_ooo",
+		.data		= &rps_allow_ooo_sysctl,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE
+	},
 #endif
 #ifdef CONFIG_NET_FLOW_LIMIT
 	{
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next] rfs: added /proc/sys/net/core/rps_allow_ooo flag to tweak flow alg
  2022-06-24 16:54 [PATCH net-next] rfs: added /proc/sys/net/core/rps_allow_ooo flag to tweak flow alg James Yonan
@ 2022-06-24 17:05 ` Stephen Hemminger
  2022-06-28  5:17   ` [PATCH net-next v2] " James Yonan
  0 siblings, 1 reply; 8+ messages in thread
From: Stephen Hemminger @ 2022-06-24 17:05 UTC (permalink / raw)
  To: James Yonan; +Cc: netdev, therbert

On Fri, 24 Jun 2022 10:54:47 -0600
James Yonan <james@openvpn.net> wrote:

> @@ -4494,6 +4496,7 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
>  		 * If the desired CPU (where last recvmsg was done) is
>  		 * different from current CPU (one in the rx-queue flow
>  		 * table entry), switch if one of the following holds:
> +		 *   - rps_allow_ooo_sysctl is enabled.
>  		 *   - Current CPU is unset (>= nr_cpu_ids).
>  		 *   - Current CPU is offline.
>  		 *   - The current CPU's queue tail has advanced beyond the
> @@ -4502,7 +4505,7 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
>  		 *     have been dequeued, thus preserving in order delivery.
>  		 */
>  		if (unlikely(tcpu != next_cpu) &&
> -		    (tcpu >= nr_cpu_ids || !cpu_online(tcpu) ||
> +		    (rps_allow_ooo_sysctl || tcpu >= nr_cpu_ids || !cpu_online(tcpu) ||
>  		     ((int)(per_cpu(softnet_data, tcpu).input_queue_head -
>  		      rflow->last_qtail)) >= 0)) {

This conditional is getting complex, maybe an inline helper function would
be clearer for developers that decide to add more in future.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net-next v2] rfs: added /proc/sys/net/core/rps_allow_ooo flag to tweak flow alg
  2022-06-24 17:05 ` Stephen Hemminger
@ 2022-06-28  5:17   ` James Yonan
  2022-06-28 17:03     ` Jakub Kicinski
  2022-06-28 17:16     ` Jakub Kicinski
  0 siblings, 2 replies; 8+ messages in thread
From: James Yonan @ 2022-06-28  5:17 UTC (permalink / raw)
  To: netdev; +Cc: therbert, stephen, James Yonan

rps_allow_ooo (0|1, default=0) -- if set to 1, allow RFS (receive flow
steering) to move a flow to a new CPU even if the old CPU queue has
pending packets.  Note that this can result in packets being delivered
out-of-order.  If set to 0 (the default), the previous behavior is
retained, where flows will not be moved as long as pending packets remain.

The motivation for this patch is that while it's good to prevent
out-of-order packets, the current RFS logic requires that all previous
packets for the flow have been dequeued before an RFS CPU switch is made,
so as to preserve in-order delivery.  In some cases, on links with heavy
VPN traffic, we have observed that this requirement is too onerous, and
that it prevents an RFS CPU switch from occurring within a reasonable time
frame if heavy traffic causes the old CPU queue to never fully drain.

So rps_allow_ooo allows the user to select the tradeoff between a more
aggressive RFS steering policy that may reorder packets on a CPU switch
event (rps_allow_ooo=1) vs. one that prioritizes in-order delivery
(rps_allow_ooo=0).

Patch history:

v2: based on feedback from Stephen Hemminger <stephen@networkplumber.org>,
    for clarity, refactor the big conditional in get_rps_cpu() that
    determines whether or not to switch the RPS CPU into a new inline
    function __should_reset_rps_cpu().

Signed-off-by: James Yonan <james@openvpn.net>
---
 Documentation/networking/scaling.rst | 12 ++++++
 include/linux/netdevice.h            |  1 +
 net/core/dev.c                       | 58 +++++++++++++++++++++-------
 net/core/sysctl_net_core.c           |  9 +++++
 4 files changed, 65 insertions(+), 15 deletions(-)

diff --git a/Documentation/networking/scaling.rst b/Documentation/networking/scaling.rst
index f78d7bf27ff5..2c2e9007a5e0 100644
--- a/Documentation/networking/scaling.rst
+++ b/Documentation/networking/scaling.rst
@@ -313,6 +313,18 @@ there are no packets outstanding on the old CPU, as the outstanding
 packets could arrive later than those about to be processed on the new
 CPU.
 
+However, in some cases it may be desirable to relax the requirement
+that a flow only moves to a new CPU when there are no packets
+outstanding on the old CPU.  For this, we introduce::
+
+  /proc/sys/net/core/rps_allow_ooo (0|1, default=0)
+
+If set to 1, allow RFS to move a flow to a new CPU even if the old CPU
+queue has pending packets.  If set to 0 (the default), flows will not be
+moved as long as pending packets remain.  So rps_allow_ooo allows the
+user to choose the tradeoff between a more aggressive steering algorithm
+that can potentially reorder packets (rps_allow_ooo=1) vs. one that
+prioritizes in-order delivery (rps_allow_ooo=0).
 
 RFS Configuration
 -----------------
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 288a58678256..dee6a481ec0d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -710,6 +710,7 @@ struct rps_sock_flow_table {
 
 #define RPS_NO_CPU 0xffff
 
+extern int rps_allow_ooo_sysctl;
 extern u32 rps_cpu_mask;
 extern struct rps_sock_flow_table __rcu *rps_sock_flow_table;
 
diff --git a/net/core/dev.c b/net/core/dev.c
index a03036456221..962f3931de10 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3899,6 +3899,8 @@ struct rps_sock_flow_table __rcu *rps_sock_flow_table __read_mostly;
 EXPORT_SYMBOL(rps_sock_flow_table);
 u32 rps_cpu_mask __read_mostly;
 EXPORT_SYMBOL(rps_cpu_mask);
+int rps_allow_ooo_sysctl __read_mostly;
+EXPORT_SYMBOL(rps_allow_ooo_sysctl);
 
 struct static_key_false rps_needed __read_mostly;
 EXPORT_SYMBOL(rps_needed);
@@ -3950,6 +3952,45 @@ set_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 	return rflow;
 }
 
+/* Helper function used by get_rps_cpu() below.
+ * Returns true if we should reset the current
+ * RPS CPU (tcpu) to next_cpu.
+ */
+static inline bool __should_reset_rps_cpu(const struct rps_dev_flow *rflow,
+					  const u32 tcpu,
+					  const u32 next_cpu)
+{
+	/* Is the desired CPU (next_cpu, where last recvmsg was done)
+	 * the same as the current CPU (tcpu from the rx-queue flow
+	 * table entry)?
+	 */
+	if (likely(tcpu == next_cpu))
+		return false; /* no action */
+
+	/* Is rps_allow_ooo_sysctl enabled? */
+	if (rps_allow_ooo_sysctl)
+		return true;  /* reset RPS CPU to next_cpu */
+
+	/* Is current CPU unset? */
+	if (unlikely(tcpu >= nr_cpu_ids))
+		return true;  /* reset RPS CPU to next_cpu */
+
+	/* Is current CPU offline? */
+	if (unlikely(!cpu_online(tcpu)))
+		return true;  /* reset RPS CPU to next_cpu */
+
+	/* Has the current CPU's queue tail advanced beyond the
+	 * last packet that was enqueued using this table entry?
+	 * If so, this guarantees that all previous packets for the
+	 * flow have been dequeued, thus preserving in-order delivery.
+	 */
+	if ((int)(per_cpu(softnet_data, tcpu).input_queue_head -
+		  rflow->last_qtail) >= 0)
+		return true;  /* reset RPS CPU to next_cpu */
+
+	return false; /* no action */
+}
+
 /*
  * get_rps_cpu is called from netif_receive_skb and returns the target
  * CPU from the RPS map of the receiving queue for a given skb.
@@ -4010,21 +4051,8 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 		rflow = &flow_table->flows[hash & flow_table->mask];
 		tcpu = rflow->cpu;
 
-		/*
-		 * If the desired CPU (where last recvmsg was done) is
-		 * different from current CPU (one in the rx-queue flow
-		 * table entry), switch if one of the following holds:
-		 *   - Current CPU is unset (>= nr_cpu_ids).
-		 *   - Current CPU is offline.
-		 *   - The current CPU's queue tail has advanced beyond the
-		 *     last packet that was enqueued using this table entry.
-		 *     This guarantees that all previous packets for the flow
-		 *     have been dequeued, thus preserving in order delivery.
-		 */
-		if (unlikely(tcpu != next_cpu) &&
-		    (tcpu >= nr_cpu_ids || !cpu_online(tcpu) ||
-		     ((int)(per_cpu(softnet_data, tcpu).input_queue_head -
-		      rflow->last_qtail)) >= 0)) {
+		/* Should we reset the current RPS CPU (tcpu) to next_cpu? */
+		if (__should_reset_rps_cpu(rflow, tcpu, next_cpu)) {
 			tcpu = next_cpu;
 			rflow = set_rps_cpu(dev, skb, rflow, next_cpu);
 		}
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 48041f50ecfb..20e82a432d88 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -471,6 +471,15 @@ static struct ctl_table net_core_table[] = {
 		.mode		= 0644,
 		.proc_handler	= rps_sock_flow_sysctl
 	},
+	{
+		.procname	= "rps_allow_ooo",
+		.data		= &rps_allow_ooo_sysctl,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE
+	},
 #endif
 #ifdef CONFIG_NET_FLOW_LIMIT
 	{
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next v2] rfs: added /proc/sys/net/core/rps_allow_ooo flag to tweak flow alg
  2022-06-28  5:17   ` [PATCH net-next v2] " James Yonan
@ 2022-06-28 17:03     ` Jakub Kicinski
  2022-06-28 23:49       ` James Yonan
  2022-06-28 17:16     ` Jakub Kicinski
  1 sibling, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2022-06-28 17:03 UTC (permalink / raw)
  To: James Yonan; +Cc: netdev, therbert, stephen

On Mon, 27 Jun 2022 23:17:54 -0600 James Yonan wrote:
> rps_allow_ooo (0|1, default=0) -- if set to 1, allow RFS (receive flow
> steering) to move a flow to a new CPU even if the old CPU queue has
> pending packets.  Note that this can result in packets being delivered
> out-of-order.  If set to 0 (the default), the previous behavior is
> retained, where flows will not be moved as long as pending packets remain.
> 
> The motivation for this patch is that while it's good to prevent
> out-of-order packets, the current RFS logic requires that all previous
> packets for the flow have been dequeued before an RFS CPU switch is made,
> so as to preserve in-order delivery.  In some cases, on links with heavy
> VPN traffic, we have observed that this requirement is too onerous, and
> that it prevents an RFS CPU switch from occurring within a reasonable time
> frame if heavy traffic causes the old CPU queue to never fully drain.
> 
> So rps_allow_ooo allows the user to select the tradeoff between a more
> aggressive RFS steering policy that may reorder packets on a CPU switch
> event (rps_allow_ooo=1) vs. one that prioritizes in-order delivery
> (rps_allow_ooo=0).

Can you give a practical example where someone would enable this?
What is the traffic being served here that does not care about getting
severely chopped up? Also why are you using RPS, it's 2022, don't all
devices of note have multi-queue support?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next v2] rfs: added /proc/sys/net/core/rps_allow_ooo flag to tweak flow alg
  2022-06-28  5:17   ` [PATCH net-next v2] " James Yonan
  2022-06-28 17:03     ` Jakub Kicinski
@ 2022-06-28 17:16     ` Jakub Kicinski
  1 sibling, 0 replies; 8+ messages in thread
From: Jakub Kicinski @ 2022-06-28 17:16 UTC (permalink / raw)
  To: James Yonan; +Cc: netdev, therbert, stephen

On Mon, 27 Jun 2022 23:17:54 -0600 James Yonan wrote:
> rps_allow_ooo (0|1, default=0) -- if set to 1, allow RFS (receive flow
> steering) to move a flow to a new CPU even if the old CPU queue has
> pending packets.  Note that this can result in packets being delivered
> out-of-order.  If set to 0 (the default), the previous behavior is
> retained, where flows will not be moved as long as pending packets remain.
> 
> The motivation for this patch is that while it's good to prevent
> out-of-order packets, the current RFS logic requires that all previous
> packets for the flow have been dequeued before an RFS CPU switch is made,
> so as to preserve in-order delivery.  In some cases, on links with heavy
> VPN traffic, we have observed that this requirement is too onerous, and
> that it prevents an RFS CPU switch from occurring within a reasonable time
> frame if heavy traffic causes the old CPU queue to never fully drain.
> 
> So rps_allow_ooo allows the user to select the tradeoff between a more
> aggressive RFS steering policy that may reorder packets on a CPU switch
> event (rps_allow_ooo=1) vs. one that prioritizes in-order delivery
> (rps_allow_ooo=0).

BTW please see:

https://lore.kernel.org/all/20220628051754.365238-1-james@openvpn.net/

Y'all should work together.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next v2] rfs: added /proc/sys/net/core/rps_allow_ooo flag to tweak flow alg
  2022-06-28 17:03     ` Jakub Kicinski
@ 2022-06-28 23:49       ` James Yonan
  2022-07-11 20:38         ` James Yonan
  0 siblings, 1 reply; 8+ messages in thread
From: James Yonan @ 2022-06-28 23:49 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, therbert, stephen

On 6/28/22 11:03, Jakub Kicinski wrote:
> On Mon, 27 Jun 2022 23:17:54 -0600 James Yonan wrote:
>> rps_allow_ooo (0|1, default=0) -- if set to 1, allow RFS (receive flow
>> steering) to move a flow to a new CPU even if the old CPU queue has
>> pending packets.  Note that this can result in packets being delivered
>> out-of-order.  If set to 0 (the default), the previous behavior is
>> retained, where flows will not be moved as long as pending packets remain.
>>
>> The motivation for this patch is that while it's good to prevent
>> out-of-order packets, the current RFS logic requires that all previous
>> packets for the flow have been dequeued before an RFS CPU switch is made,
>> so as to preserve in-order delivery.  In some cases, on links with heavy
>> VPN traffic, we have observed that this requirement is too onerous, and
>> that it prevents an RFS CPU switch from occurring within a reasonable time
>> frame if heavy traffic causes the old CPU queue to never fully drain.
>>
>> So rps_allow_ooo allows the user to select the tradeoff between a more
>> aggressive RFS steering policy that may reorder packets on a CPU switch
>> event (rps_allow_ooo=1) vs. one that prioritizes in-order delivery
>> (rps_allow_ooo=0).
> Can you give a practical example where someone would enable this?
> What is the traffic being served here that does not care about getting
> severely chopped up? Also why are you using RPS, it's 2022, don't all
> devices of note have multi-queue support?

So the problem with VPN transport is that you have encryption overhead 
that can be CPU intensive.  Suppose I can get 10 Gbps throughput per 
core.  Now suppose I have 4 different 10 Gbps sessions on my 4 core 
machine.  In a perfect world, each of those sessions would migrate to a 
different core and you would achieve the full parallelism of your 
hardware.  RFS helps to make this work, but the existing RFS algorithm 
sometimes gets stuck with multiple sessions on one core, while other 
cores are idle.  I found that this often occurs because RFS puts a high 
priority on maintaining in-order delivery, so once the queues are 
operating at full speed, it's very difficult to find an opportunity to 
switch CPUs without some packet reordering.  But the cost of being 
strict about packet reordering is that you end up with multiple sessions 
stuck on the same core, alongside idle cores.  This is solved by setting 
rps_allow_ooo to 1.  You might get a few reordered packets on the CPU 
switch event, but once the queues stabilize, you get significantly 
higher throughput because you can actively load balance the sessions 
across CPUs.

Re: why are we still using RPS/RFS in 2022?  It's very useful for load 
balancing L4 sessions across multiple CPUs (not only across multiple net 
device queues).

James



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next v2] rfs: added /proc/sys/net/core/rps_allow_ooo flag to tweak flow alg
  2022-06-28 23:49       ` James Yonan
@ 2022-07-11 20:38         ` James Yonan
  2022-07-11 20:44           ` Jakub Kicinski
  0 siblings, 1 reply; 8+ messages in thread
From: James Yonan @ 2022-07-11 20:38 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, therbert, stephen

On 6/28/22 17:49, James Yonan wrote:
> On 6/28/22 11:03, Jakub Kicinski wrote:
>> On Mon, 27 Jun 2022 23:17:54 -0600 James Yonan wrote:
>>> rps_allow_ooo (0|1, default=0) -- if set to 1, allow RFS (receive flow
>>> steering) to move a flow to a new CPU even if the old CPU queue has
>>> pending packets.  Note that this can result in packets being delivered
>>> out-of-order.  If set to 0 (the default), the previous behavior is
>>> retained, where flows will not be moved as long as pending packets 
>>> remain.
>>>
>>> The motivation for this patch is that while it's good to prevent
>>> out-of-order packets, the current RFS logic requires that all previous
>>> packets for the flow have been dequeued before an RFS CPU switch is 
>>> made,
>>> so as to preserve in-order delivery.  In some cases, on links with 
>>> heavy
>>> VPN traffic, we have observed that this requirement is too onerous, and
>>> that it prevents an RFS CPU switch from occurring within a 
>>> reasonable time
>>> frame if heavy traffic causes the old CPU queue to never fully drain.
>>>
>>> So rps_allow_ooo allows the user to select the tradeoff between a more
>>> aggressive RFS steering policy that may reorder packets on a CPU switch
>>> event (rps_allow_ooo=1) vs. one that prioritizes in-order delivery
>>> (rps_allow_ooo=0).
>> Can you give a practical example where someone would enable this?
>> What is the traffic being served here that does not care about getting
>> severely chopped up? Also why are you using RPS, it's 2022, don't all
>> devices of note have multi-queue support?
>
> So the problem with VPN transport is that you have encryption overhead 
> that can be CPU intensive.  Suppose I can get 10 Gbps throughput per 
> core.  Now suppose I have 4 different 10 Gbps sessions on my 4 core 
> machine.  In a perfect world, each of those sessions would migrate to 
> a different core and you would achieve the full parallelism of your 
> hardware.  RFS helps to make this work, but the existing RFS algorithm 
> sometimes gets stuck with multiple sessions on one core, while other 
> cores are idle.  I found that this often occurs because RFS puts a 
> high priority on maintaining in-order delivery, so once the queues are 
> operating at full speed, it's very difficult to find an opportunity to 
> switch CPUs without some packet reordering.  But the cost of being 
> strict about packet reordering is that you end up with multiple 
> sessions stuck on the same core, alongside idle cores.  This is solved 
> by setting rps_allow_ooo to 1.  You might get a few reordered packets 
> on the CPU switch event, but once the queues stabilize, you get 
> significantly higher throughput because you can actively load balance 
> the sessions across CPUs.
>
> Re: why are we still using RPS/RFS in 2022?  It's very useful for load 
> balancing L4 sessions across multiple CPUs (not only across multiple 
> net device queues).

Any further questions/comments about this patch?  The v2 patch 
incorporates all feedback received so far, including refactoring a large 
conditional in the original code to make it more readable and maintainable.

James


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next v2] rfs: added /proc/sys/net/core/rps_allow_ooo flag to tweak flow alg
  2022-07-11 20:38         ` James Yonan
@ 2022-07-11 20:44           ` Jakub Kicinski
  0 siblings, 0 replies; 8+ messages in thread
From: Jakub Kicinski @ 2022-07-11 20:44 UTC (permalink / raw)
  To: James Yonan; +Cc: netdev, therbert, stephen

On Mon, 11 Jul 2022 14:38:39 -0600 James Yonan wrote:
> Any further questions/comments about this patch?  The v2 patch 
> incorporates all feedback received so far, including refactoring a large 
> conditional in the original code to make it more readable and maintainable.

There was another patch on the list doing a very similar thing right
around the time you posted yours. IIUC wireguard implements farming 
out decryption to multiple cores, implementing something in the tunnel
layer seems better. You can do the crypto in process context which is
much more efficient while at it, and put the packets back in order.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-07-11 20:44 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-24 16:54 [PATCH net-next] rfs: added /proc/sys/net/core/rps_allow_ooo flag to tweak flow alg James Yonan
2022-06-24 17:05 ` Stephen Hemminger
2022-06-28  5:17   ` [PATCH net-next v2] " James Yonan
2022-06-28 17:03     ` Jakub Kicinski
2022-06-28 23:49       ` James Yonan
2022-07-11 20:38         ` James Yonan
2022-07-11 20:44           ` Jakub Kicinski
2022-06-28 17:16     ` Jakub Kicinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.