linux-kselftest.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v2 0/3] net: introduce rps_default_mask
@ 2020-10-30 11:16 Paolo Abeni
  2020-10-30 11:16 ` [PATCH net-next v2 1/3] net/sysctl: factor-out netdev_rx_queue_set_rps_mask() helper Paolo Abeni
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Paolo Abeni @ 2020-10-30 11:16 UTC (permalink / raw)
  To: netdev
  Cc: Jonathan Corbet, David S. Miller, Jakub Kicinski, Shuah Khan,
	linux-doc, linux-kselftest, Marcelo Tosatti

Real-time setups try hard to ensure proper isolation between time
critical applications and e.g. network processing performed by the
network stack in softirq and RPS is used to move the softirq 
activity away from the isolated core.

If the network configuration is dynamic, with netns and devices
routinely created at run-time, enforcing the correct RPS setting
on each newly created device allowing to transient bad configuration
became complex.

These series try to address the above, introducing a new
sysctl knob: rps_default_mask. The new sysctl entry allows
configuring a systemwide RPS mask, to be enforced since receive 
queue creation time without any fourther per device configuration
required.

Additionally, a simple self-test is introduced to check the 
rps_default_mask behavior.

v1 -> v2:
 - fix sparse warning in patch 2/3

Paolo Abeni (3):
  net/sysctl: factor-out netdev_rx_queue_set_rps_mask() helper
  net/core: introduce default_rps_mask netns attribute
  self-tests: introduce self-tests for RPS default mask

 Documentation/admin-guide/sysctl/net.rst      |  6 ++
 include/linux/netdevice.h                     |  1 +
 net/core/net-sysfs.c                          | 73 +++++++++++--------
 net/core/sysctl_net_core.c                    | 58 +++++++++++++++
 tools/testing/selftests/net/Makefile          |  1 +
 tools/testing/selftests/net/config            |  3 +
 .../testing/selftests/net/rps_default_mask.sh | 57 +++++++++++++++
 7 files changed, 169 insertions(+), 30 deletions(-)
 create mode 100755 tools/testing/selftests/net/rps_default_mask.sh

-- 
2.26.2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net-next v2 1/3] net/sysctl: factor-out netdev_rx_queue_set_rps_mask() helper
  2020-10-30 11:16 [PATCH net-next v2 0/3] net: introduce rps_default_mask Paolo Abeni
@ 2020-10-30 11:16 ` Paolo Abeni
  2020-10-30 11:16 ` [PATCH net-next v2 2/3] net/core: introduce default_rps_mask netns attribute Paolo Abeni
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Paolo Abeni @ 2020-10-30 11:16 UTC (permalink / raw)
  To: netdev
  Cc: Jonathan Corbet, David S. Miller, Jakub Kicinski, Shuah Khan,
	linux-doc, linux-kselftest, Marcelo Tosatti

Will simplify the following patch. No functional change
intended.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/core/net-sysfs.c | 66 ++++++++++++++++++++++++--------------------
 1 file changed, 36 insertions(+), 30 deletions(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 94fff0700bdd..b57426707216 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -737,42 +737,18 @@ static ssize_t show_rps_map(struct netdev_rx_queue *queue, char *buf)
 	return len < PAGE_SIZE ? len : -EINVAL;
 }
 
-static ssize_t store_rps_map(struct netdev_rx_queue *queue,
-			     const char *buf, size_t len)
+static int netdev_rx_queue_set_rps_mask(struct netdev_rx_queue *queue,
+					cpumask_var_t mask)
 {
-	struct rps_map *old_map, *map;
-	cpumask_var_t mask;
-	int err, cpu, i, hk_flags;
 	static DEFINE_MUTEX(rps_map_mutex);
-
-	if (!capable(CAP_NET_ADMIN))
-		return -EPERM;
-
-	if (!alloc_cpumask_var(&mask, GFP_KERNEL))
-		return -ENOMEM;
-
-	err = bitmap_parse(buf, len, cpumask_bits(mask), nr_cpumask_bits);
-	if (err) {
-		free_cpumask_var(mask);
-		return err;
-	}
-
-	if (!cpumask_empty(mask)) {
-		hk_flags = HK_FLAG_DOMAIN | HK_FLAG_WQ;
-		cpumask_and(mask, mask, housekeeping_cpumask(hk_flags));
-		if (cpumask_empty(mask)) {
-			free_cpumask_var(mask);
-			return -EINVAL;
-		}
-	}
+	struct rps_map *old_map, *map;
+	int cpu, i;
 
 	map = kzalloc(max_t(unsigned int,
 			    RPS_MAP_SIZE(cpumask_weight(mask)), L1_CACHE_BYTES),
 		      GFP_KERNEL);
-	if (!map) {
-		free_cpumask_var(mask);
+	if (!map)
 		return -ENOMEM;
-	}
 
 	i = 0;
 	for_each_cpu_and(cpu, mask, cpu_online_mask)
@@ -799,9 +775,39 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue,
 
 	if (old_map)
 		kfree_rcu(old_map, rcu);
+	return 0;
+}
+
+static ssize_t store_rps_map(struct netdev_rx_queue *queue,
+			     const char *buf, size_t len)
+{
+	cpumask_var_t mask;
+	int err, hk_flags;
+
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
+	if (!alloc_cpumask_var(&mask, GFP_KERNEL))
+		return -ENOMEM;
+
+	err = bitmap_parse(buf, len, cpumask_bits(mask), nr_cpumask_bits);
+	if (err)
+		goto out;
 
+	if (!cpumask_empty(mask)) {
+		hk_flags = HK_FLAG_DOMAIN | HK_FLAG_WQ;
+		cpumask_and(mask, mask, housekeeping_cpumask(hk_flags));
+		if (cpumask_empty(mask)) {
+			err = -EINVAL;
+			goto out;
+		}
+	}
+
+	err = netdev_rx_queue_set_rps_mask(queue, mask);
+
+out:
 	free_cpumask_var(mask);
-	return len;
+	return err ? : len;
 }
 
 static ssize_t show_rps_dev_flow_table_cnt(struct netdev_rx_queue *queue,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next v2 2/3] net/core: introduce default_rps_mask netns attribute
  2020-10-30 11:16 [PATCH net-next v2 0/3] net: introduce rps_default_mask Paolo Abeni
  2020-10-30 11:16 ` [PATCH net-next v2 1/3] net/sysctl: factor-out netdev_rx_queue_set_rps_mask() helper Paolo Abeni
@ 2020-10-30 11:16 ` Paolo Abeni
  2020-10-30 11:16 ` [PATCH net-next v2 3/3] self-tests: introduce self-tests for RPS default mask Paolo Abeni
  2020-11-02 22:54 ` [PATCH net-next v2 0/3] net: introduce rps_default_mask Jakub Kicinski
  3 siblings, 0 replies; 13+ messages in thread
From: Paolo Abeni @ 2020-10-30 11:16 UTC (permalink / raw)
  To: netdev
  Cc: Jonathan Corbet, David S. Miller, Jakub Kicinski, Shuah Khan,
	linux-doc, linux-kselftest, Marcelo Tosatti

If RPS is enabled, this allows configuring a default rps
mask, which is effective since receive queue creation time.

A default RPS mask allows the system admin to ensure proper
isolation, avoiding races at network namespace or device
creation time.

The default RPS mask is initially empty, and can be
modified via a newly added sysctl entry.

v1 -> v2:
 - declare rps_default_mask in netdevice.h to avoid a
   sparse warning - Jakub

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 Documentation/admin-guide/sysctl/net.rst |  6 +++
 include/linux/netdevice.h                |  1 +
 net/core/net-sysfs.c                     |  7 +++
 net/core/sysctl_net_core.c               | 58 ++++++++++++++++++++++++
 4 files changed, 72 insertions(+)

diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst
index 57fd6ce68fe0..818cb2030a8b 100644
--- a/Documentation/admin-guide/sysctl/net.rst
+++ b/Documentation/admin-guide/sysctl/net.rst
@@ -211,6 +211,12 @@ rmem_max
 
 The maximum receive socket buffer size in bytes.
 
+rps_default_mask
+----------------
+
+The default RPS CPU mask used on newly created network devices. An empty
+mask means RPS disabled by default.
+
 tstamp_allow_data
 -----------------
 Allow processes to receive tx timestamps looped together with the original
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 964b494b0e8d..2593689648d3 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -200,6 +200,7 @@ struct net_device_stats {
 #include <linux/static_key.h>
 extern struct static_key_false rps_needed;
 extern struct static_key_false rfs_needed;
+extern struct cpumask rps_default_mask;
 #endif
 
 struct neighbour;
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index b57426707216..3f3d1d467fe0 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -983,6 +983,13 @@ static int rx_queue_add_kobject(struct net_device *dev, int index)
 			goto err;
 	}
 
+#if IS_ENABLED(CONFIG_RPS) && IS_ENABLED(CONFIG_SYSCTL)
+	if (!cpumask_empty(&rps_default_mask)) {
+		error = netdev_rx_queue_set_rps_mask(queue, &rps_default_mask);
+		if (error)
+			goto err;
+	}
+#endif
 	kobject_uevent(kobj, KOBJ_ADD);
 
 	return error;
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index d86d8d11cfe4..13451ac88a74 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -15,6 +15,7 @@
 #include <linux/vmalloc.h>
 #include <linux/init.h>
 #include <linux/slab.h>
+#include <linux/sched/isolation.h>
 
 #include <net/ip.h>
 #include <net/sock.h>
@@ -46,6 +47,54 @@ int sysctl_devconf_inherit_init_net __read_mostly;
 EXPORT_SYMBOL(sysctl_devconf_inherit_init_net);
 
 #ifdef CONFIG_RPS
+struct cpumask rps_default_mask;
+
+static int rps_default_mask_sysctl(struct ctl_table *table, int write,
+				   void *buffer, size_t *lenp, loff_t *ppos)
+{
+	int len, err = 0;
+
+	rtnl_lock();
+	if (write) {
+		err = cpumask_parse(buffer, &rps_default_mask);
+		if (err)
+			goto done;
+
+		if (!cpumask_empty(&rps_default_mask)) {
+			int hk_flags = HK_FLAG_DOMAIN | HK_FLAG_WQ;
+			cpumask_and(&rps_default_mask, &rps_default_mask,
+				    housekeeping_cpumask(hk_flags));
+			if (cpumask_empty(&rps_default_mask)) {
+				err = -EINVAL;
+				goto done;
+			}
+		}
+	} else {
+		char kbuf[128];
+
+		if (*ppos || !*lenp) {
+			*lenp = 0;
+			goto done;
+		}
+
+		len = min(sizeof(kbuf) - 1, *lenp);
+		len = scnprintf(kbuf, len, "%*pb", cpumask_pr_args(&rps_default_mask));
+		if (!len) {
+			*lenp = 0;
+			goto done;
+		}
+		if (len < *lenp)
+			kbuf[len++] = '\n';
+		memcpy(buffer, kbuf, len);
+		*lenp = len;
+		*ppos += len;
+	}
+
+done:
+	rtnl_unlock();
+	return err;
+}
+
 static int rps_sock_flow_sysctl(struct ctl_table *table, int write,
 				void *buffer, size_t *lenp, loff_t *ppos)
 {
@@ -466,6 +515,11 @@ static struct ctl_table net_core_table[] = {
 		.mode		= 0644,
 		.proc_handler	= rps_sock_flow_sysctl
 	},
+	{
+		.procname	= "rps_default_mask",
+		.mode		= 0644,
+		.proc_handler	= rps_default_mask_sysctl
+	},
 #endif
 #ifdef CONFIG_NET_FLOW_LIMIT
 	{
@@ -648,6 +702,10 @@ static __net_initdata struct pernet_operations sysctl_core_ops = {
 
 static __init int sysctl_core_init(void)
 {
+#if IS_ENABLED(CONFIG_RPS)
+	cpumask_copy(&rps_default_mask, cpu_none_mask);
+#endif
+
 	register_net_sysctl(&init_net, "net/core", net_core_table);
 	return register_pernet_subsys(&sysctl_core_ops);
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next v2 3/3] self-tests: introduce self-tests for RPS default mask
  2020-10-30 11:16 [PATCH net-next v2 0/3] net: introduce rps_default_mask Paolo Abeni
  2020-10-30 11:16 ` [PATCH net-next v2 1/3] net/sysctl: factor-out netdev_rx_queue_set_rps_mask() helper Paolo Abeni
  2020-10-30 11:16 ` [PATCH net-next v2 2/3] net/core: introduce default_rps_mask netns attribute Paolo Abeni
@ 2020-10-30 11:16 ` Paolo Abeni
  2020-11-02 22:54 ` [PATCH net-next v2 0/3] net: introduce rps_default_mask Jakub Kicinski
  3 siblings, 0 replies; 13+ messages in thread
From: Paolo Abeni @ 2020-10-30 11:16 UTC (permalink / raw)
  To: netdev
  Cc: Jonathan Corbet, David S. Miller, Jakub Kicinski, Shuah Khan,
	linux-doc, linux-kselftest, Marcelo Tosatti

Ensure that RPS default mask changes take place on
all newly created netns/devices and don't affect
existing ones.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 tools/testing/selftests/net/Makefile          |  1 +
 tools/testing/selftests/net/config            |  3 +
 .../testing/selftests/net/rps_default_mask.sh | 57 +++++++++++++++++++
 3 files changed, 61 insertions(+)
 create mode 100755 tools/testing/selftests/net/rps_default_mask.sh

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index ef352477cac6..2531ec3e5027 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -21,6 +21,7 @@ TEST_PROGS += rxtimestamp.sh
 TEST_PROGS += devlink_port_split.py
 TEST_PROGS += drop_monitor_tests.sh
 TEST_PROGS += vrf_route_leaking.sh
+TEST_PROGS += rps_default_mask.sh
 TEST_PROGS_EXTENDED := in_netns.sh
 TEST_GEN_FILES =  socket nettest
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy reuseport_addr_any
diff --git a/tools/testing/selftests/net/config b/tools/testing/selftests/net/config
index 4d5df8e1eee7..5d467364f082 100644
--- a/tools/testing/selftests/net/config
+++ b/tools/testing/selftests/net/config
@@ -34,3 +34,6 @@ CONFIG_TRACEPOINTS=y
 CONFIG_NET_DROP_MONITOR=m
 CONFIG_NETDEVSIM=m
 CONFIG_NET_FOU=m
+CONFIG_RPS=y
+CONFIG_SYSFS=y
+CONFIG_PROC_SYSCTL=y
diff --git a/tools/testing/selftests/net/rps_default_mask.sh b/tools/testing/selftests/net/rps_default_mask.sh
new file mode 100755
index 000000000000..c81c0ac7ddfe
--- /dev/null
+++ b/tools/testing/selftests/net/rps_default_mask.sh
@@ -0,0 +1,57 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+
+readonly ksft_skip=4
+readonly cpus=$(nproc)
+ret=0
+
+[ $cpus -gt 2 ] || exit $ksft_skip
+
+readonly INITIAL_RPS_DEFAULT_MASK=$(cat /proc/sys/net/core/rps_default_mask)
+readonly NETNS="ns-$(mktemp -u XXXXXX)"
+
+setup() {
+	ip netns add "${NETNS}"
+	ip -netns "${NETNS}" link set lo up
+}
+
+cleanup() {
+	echo $INITIAL_RPS_DEFAULT_MASK > /proc/sys/net/core/rps_default_mask
+	ip netns del $NETNS
+}
+
+chk_rps() {
+	local rps_mask expected_rps_mask=$3
+	local dev_name=$2
+	local msg=$1
+
+	rps_mask=$(ip netns exec $NETNS cat /sys/class/net/$dev_name/queues/rx-0/rps_cpus)
+	printf "%-60s" "$msg"
+	if [ $rps_mask -eq $expected_rps_mask ]; then
+		echo "[ ok ]"
+	else
+		echo "[fail] expected $expected_rps_mask found $rps_mask"
+		ret=1
+	fi
+}
+
+trap cleanup EXIT
+
+echo 0 > /proc/sys/net/core/rps_default_mask
+setup
+chk_rps "empty rps_default_mask" lo 0
+cleanup
+
+echo 1 > /proc/sys/net/core/rps_default_mask
+setup
+chk_rps "non zero rps_default_mask" lo 1
+
+echo 3 > /proc/sys/net/core/rps_default_mask
+chk_rps "changing rps_default_mask dont affect existing netns" lo 1
+
+ip -n $NETNS link add type veth
+ip -n $NETNS link set dev veth0 up
+ip -n $NETNS link set dev veth1 up
+chk_rps "changing rps_default_mask affect newly created devices" veth0 3
+chk_rps "changing rps_default_mask affect newly created devices[II]" veth1 3
+exit $ret
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 0/3] net: introduce rps_default_mask
  2020-10-30 11:16 [PATCH net-next v2 0/3] net: introduce rps_default_mask Paolo Abeni
                   ` (2 preceding siblings ...)
  2020-10-30 11:16 ` [PATCH net-next v2 3/3] self-tests: introduce self-tests for RPS default mask Paolo Abeni
@ 2020-11-02 22:54 ` Jakub Kicinski
  2020-11-02 23:27   ` Saeed Mahameed
  2020-11-03 15:22   ` Paolo Abeni
  3 siblings, 2 replies; 13+ messages in thread
From: Jakub Kicinski @ 2020-11-02 22:54 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Jonathan Corbet, David S. Miller, Shuah Khan, linux-doc,
	linux-kselftest, Marcelo Tosatti

On Fri, 30 Oct 2020 12:16:00 +0100 Paolo Abeni wrote:
> Real-time setups try hard to ensure proper isolation between time
> critical applications and e.g. network processing performed by the
> network stack in softirq and RPS is used to move the softirq 
> activity away from the isolated core.
> 
> If the network configuration is dynamic, with netns and devices
> routinely created at run-time, enforcing the correct RPS setting
> on each newly created device allowing to transient bad configuration
> became complex.
> 
> These series try to address the above, introducing a new
> sysctl knob: rps_default_mask. The new sysctl entry allows
> configuring a systemwide RPS mask, to be enforced since receive 
> queue creation time without any fourther per device configuration
> required.
> 
> Additionally, a simple self-test is introduced to check the 
> rps_default_mask behavior.

RPS is disabled by default, the processing is going to happen wherever
the IRQ is mapped, and one would hope that the IRQ is not mapped to the
core where the critical processing runs.

Would you mind elaborating further on the use case?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 0/3] net: introduce rps_default_mask
  2020-11-02 22:54 ` [PATCH net-next v2 0/3] net: introduce rps_default_mask Jakub Kicinski
@ 2020-11-02 23:27   ` Saeed Mahameed
  2020-11-03 15:22   ` Paolo Abeni
  1 sibling, 0 replies; 13+ messages in thread
From: Saeed Mahameed @ 2020-11-02 23:27 UTC (permalink / raw)
  To: Jakub Kicinski, Paolo Abeni
  Cc: netdev, Jonathan Corbet, David S. Miller, Shuah Khan, linux-doc,
	linux-kselftest, Marcelo Tosatti

On Mon, 2020-11-02 at 14:54 -0800, Jakub Kicinski wrote:
> On Fri, 30 Oct 2020 12:16:00 +0100 Paolo Abeni wrote:
> > Real-time setups try hard to ensure proper isolation between time
> > critical applications and e.g. network processing performed by the
> > network stack in softirq and RPS is used to move the softirq 
> > activity away from the isolated core.
> > 
> > If the network configuration is dynamic, with netns and devices
> > routinely created at run-time, enforcing the correct RPS setting
> > on each newly created device allowing to transient bad
> > configuration
> > became complex.
> > 
> > These series try to address the above, introducing a new
> > sysctl knob: rps_default_mask. The new sysctl entry allows
> > configuring a systemwide RPS mask, to be enforced since receive 
> > queue creation time without any fourther per device configuration
> > required.
> > 

The whole thing can be replaced with a user daemon scripts that
monitors all newly created devices and assign to them whatever rps mask
(call it default).

So why do we need this special logic in kernel ? 

I am not sure about this, but if rps queues sysfs are available before
the netdev is up, then you can also use udevd to assign the rps masks
before such devices are even brought up, so you would avoid the race
conditions that you described, which are not really clear to me to be
honest.

> > Additionally, a simple self-test is introduced to check the 
> > rps_default_mask behavior.
> 
> RPS is disabled by default, the processing is going to happen
> wherever
> the IRQ is mapped, and one would hope that the IRQ is not mapped to
> the
> core where the critical processing runs.
> 
> Would you mind elaborating further on the use case?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 0/3] net: introduce rps_default_mask
  2020-11-02 22:54 ` [PATCH net-next v2 0/3] net: introduce rps_default_mask Jakub Kicinski
  2020-11-02 23:27   ` Saeed Mahameed
@ 2020-11-03 15:22   ` Paolo Abeni
  2020-11-03 16:52     ` Jakub Kicinski
  1 sibling, 1 reply; 13+ messages in thread
From: Paolo Abeni @ 2020-11-03 15:22 UTC (permalink / raw)
  To: Jakub Kicinski, Saeed Mahameed
  Cc: netdev, Jonathan Corbet, David S. Miller, Shuah Khan, linux-doc,
	linux-kselftest, Marcelo Tosatti

On Mon, 2020-11-02 at 14:54 -0800, Jakub Kicinski wrote:
> On Fri, 30 Oct 2020 12:16:00 +0100 Paolo Abeni wrote:
> > Real-time setups try hard to ensure proper isolation between time
> > critical applications and e.g. network processing performed by the
> > network stack in softirq and RPS is used to move the softirq 
> > activity away from the isolated core.
> > 
> > If the network configuration is dynamic, with netns and devices
> > routinely created at run-time, enforcing the correct RPS setting
> > on each newly created device allowing to transient bad configuration
> > became complex.
> > 
> > These series try to address the above, introducing a new
> > sysctl knob: rps_default_mask. The new sysctl entry allows
> > configuring a systemwide RPS mask, to be enforced since receive 
> > queue creation time without any fourther per device configuration
> > required.
> > 
> > Additionally, a simple self-test is introduced to check the 
> > rps_default_mask behavior.
> 
> RPS is disabled by default, the processing is going to happen wherever
> the IRQ is mapped, and one would hope that the IRQ is not mapped to the
> core where the critical processing runs.
> 
> Would you mind elaborating further on the use case?

On Mon, 2020-11-02 at 15:27 -0800, Saeed Mahameed wrote:
> The whole thing can be replaced with a user daemon scripts that
> monitors all newly created devices and assign to them whatever rps mask
> (call it default).
> 
> So why do we need this special logic in kernel ? 
> 
> I am not sure about this, but if rps queues sysfs are available before
> the netdev is up, then you can also use udevd to assign the rps masks
> before such devices are even brought up, so you would avoid the race
> conditions that you described, which are not really clear to me to be
> honest.

Thank you for the feedback.

Please allow me to answer you both here, as your questions are related.

The relevant use case is an host running containers (with the related
orchestration tools) in a RT environment. Virtual devices (veths, ovs
ports, etc.) are created by the orchestration tools at run-time.
Critical processes are allowed to send packets/generate outgoing
network traffic - but any interrupt is moved away from the related
cores, so that usual incoming network traffic processing does not
happen there.

Still an xmit operation on a virtual devices may be transmitted via ovs
or veth, with the relevant forwarding operation happening in a softirq
on the same CPU originating the packet. 

RPS is configured (even) on such virtual devices to move away the
forwarding from the relevant CPUs.

As Saeed noted, such configuration could be possibly performed via some
user-space daemon monitoring network devices and network namespaces
creation. That will be anyway prone to some race: the orchestation tool
may create and enable the netns and virtual devices before the daemon
has properly set the RPS mask.

In the latter scenario some packet forwarding could still slip in the
relevant CPU, causing measurable latency. In all non RT scenarios the
above will be likely irrelevant, but in the RT context that is not
acceptable - e.g. it causes in real environments latency above the
defined limits, while the proposed patches avoid the issue.

Do you see any other simple way to avoid the above race?

Please let me know if the above answers your doubts,

Paolo


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 0/3] net: introduce rps_default_mask
  2020-11-03 15:22   ` Paolo Abeni
@ 2020-11-03 16:52     ` Jakub Kicinski
  2020-11-04 17:36       ` Paolo Abeni
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2020-11-03 16:52 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Saeed Mahameed, netdev, Jonathan Corbet, David S. Miller,
	Shuah Khan, linux-doc, linux-kselftest, Marcelo Tosatti,
	Daniel Borkmann

On Tue, 03 Nov 2020 16:22:07 +0100 Paolo Abeni wrote:
> On Mon, 2020-11-02 at 14:54 -0800, Jakub Kicinski wrote:
> > On Fri, 30 Oct 2020 12:16:00 +0100 Paolo Abeni wrote:  
> > > Real-time setups try hard to ensure proper isolation between time
> > > critical applications and e.g. network processing performed by the
> > > network stack in softirq and RPS is used to move the softirq 
> > > activity away from the isolated core.
> > > 
> > > If the network configuration is dynamic, with netns and devices
> > > routinely created at run-time, enforcing the correct RPS setting
> > > on each newly created device allowing to transient bad configuration
> > > became complex.
> > > 
> > > These series try to address the above, introducing a new
> > > sysctl knob: rps_default_mask. The new sysctl entry allows
> > > configuring a systemwide RPS mask, to be enforced since receive 
> > > queue creation time without any fourther per device configuration
> > > required.
> > > 
> > > Additionally, a simple self-test is introduced to check the 
> > > rps_default_mask behavior.  
> > 
> > RPS is disabled by default, the processing is going to happen wherever
> > the IRQ is mapped, and one would hope that the IRQ is not mapped to the
> > core where the critical processing runs.
> > 
> > Would you mind elaborating further on the use case?  
> 
> On Mon, 2020-11-02 at 15:27 -0800, Saeed Mahameed wrote:
> > The whole thing can be replaced with a user daemon scripts that
> > monitors all newly created devices and assign to them whatever rps mask
> > (call it default).
> > 
> > So why do we need this special logic in kernel ? 
> > 
> > I am not sure about this, but if rps queues sysfs are available before
> > the netdev is up, then you can also use udevd to assign the rps masks
> > before such devices are even brought up, so you would avoid the race
> > conditions that you described, which are not really clear to me to be
> > honest.  
> 
> Thank you for the feedback.
> 
> Please allow me to answer you both here, as your questions are related.
> 
> The relevant use case is an host running containers (with the related
> orchestration tools) in a RT environment. Virtual devices (veths, ovs
> ports, etc.) are created by the orchestration tools at run-time.
> Critical processes are allowed to send packets/generate outgoing
> network traffic - but any interrupt is moved away from the related
> cores, so that usual incoming network traffic processing does not
> happen there.
> 
> Still an xmit operation on a virtual devices may be transmitted via ovs
> or veth, with the relevant forwarding operation happening in a softirq
> on the same CPU originating the packet. 
> 
> RPS is configured (even) on such virtual devices to move away the
> forwarding from the relevant CPUs.
> 
> As Saeed noted, such configuration could be possibly performed via some
> user-space daemon monitoring network devices and network namespaces
> creation. That will be anyway prone to some race: the orchestation tool
> may create and enable the netns and virtual devices before the daemon
> has properly set the RPS mask.
> 
> In the latter scenario some packet forwarding could still slip in the
> relevant CPU, causing measurable latency. In all non RT scenarios the
> above will be likely irrelevant, but in the RT context that is not
> acceptable - e.g. it causes in real environments latency above the
> defined limits, while the proposed patches avoid the issue.
> 
> Do you see any other simple way to avoid the above race?
> 
> Please let me know if the above answers your doubts,

Thanks, that makes it clearer now.

Depending on how RT-aware your container management is it may or may not
be the right place to configure this, as it creates the veth interface.
Presumably it's the container management which does the placement of
the tasks to cores, why is it not setting other attributes, like RPS?

Also I wonder if it would make sense to turn this knob into something
more generic. When we arrive at the threaded NAPIs - could it make
sense for the threads to inherit your mask as the CPUs they are allowed
to run on?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 0/3] net: introduce rps_default_mask
  2020-11-03 16:52     ` Jakub Kicinski
@ 2020-11-04 17:36       ` Paolo Abeni
  2020-11-04 18:18         ` Marcelo Tosatti
  2020-11-04 19:42         ` Jakub Kicinski
  0 siblings, 2 replies; 13+ messages in thread
From: Paolo Abeni @ 2020-11-04 17:36 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, netdev, Jonathan Corbet, David S. Miller,
	Shuah Khan, linux-doc, linux-kselftest, Marcelo Tosatti,
	Daniel Borkmann

On Tue, 2020-11-03 at 08:52 -0800, Jakub Kicinski wrote:
> On Tue, 03 Nov 2020 16:22:07 +0100 Paolo Abeni wrote:
> > The relevant use case is an host running containers (with the related
> > orchestration tools) in a RT environment. Virtual devices (veths, ovs
> > ports, etc.) are created by the orchestration tools at run-time.
> > Critical processes are allowed to send packets/generate outgoing
> > network traffic - but any interrupt is moved away from the related
> > cores, so that usual incoming network traffic processing does not
> > happen there.
> > 
> > Still an xmit operation on a virtual devices may be transmitted via ovs
> > or veth, with the relevant forwarding operation happening in a softirq
> > on the same CPU originating the packet. 
> > 
> > RPS is configured (even) on such virtual devices to move away the
> > forwarding from the relevant CPUs.
> > 
> > As Saeed noted, such configuration could be possibly performed via some
> > user-space daemon monitoring network devices and network namespaces
> > creation. That will be anyway prone to some race: the orchestation tool
> > may create and enable the netns and virtual devices before the daemon
> > has properly set the RPS mask.
> > 
> > In the latter scenario some packet forwarding could still slip in the
> > relevant CPU, causing measurable latency. In all non RT scenarios the
> > above will be likely irrelevant, but in the RT context that is not
> > acceptable - e.g. it causes in real environments latency above the
> > defined limits, while the proposed patches avoid the issue.
> > 
> > Do you see any other simple way to avoid the above race?
> > 
> > Please let me know if the above answers your doubts,
> 
> Thanks, that makes it clearer now.
> 
> Depending on how RT-aware your container management is it may or may not
> be the right place to configure this, as it creates the veth interface.
> Presumably it's the container management which does the placement of
> the tasks to cores, why is it not setting other attributes, like RPS?

The container orchestration is quite complex, and I'm unsure isolation
and networking configuration are performed (or can be performed) by the
same precess (without an heavy refactor).

On the flip hand, the global rps mask knob looked quite
straightforward to me.

Possibly I can reduce the amount of new code introduced by this
patchset removing some code duplication
between rps_default_mask_sysctl() and flow_limit_cpu_sysctl(). Would
that make this change more acceptable? Or should I drop this
altogether?

> Also I wonder if it would make sense to turn this knob into something
> more generic. When we arrive at the threaded NAPIs - could it make
> sense for the threads to inherit your mask as the CPUs they are allowed
> to run on?

I personally *think* this would be fine - and good. But isn't a bit
premature discussing the integration of 2 missing pieces ? :)

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 0/3] net: introduce rps_default_mask
  2020-11-04 17:36       ` Paolo Abeni
@ 2020-11-04 18:18         ` Marcelo Tosatti
  2020-11-04 19:42         ` Jakub Kicinski
  1 sibling, 0 replies; 13+ messages in thread
From: Marcelo Tosatti @ 2020-11-04 18:18 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Jakub Kicinski, Saeed Mahameed, netdev, Jonathan Corbet,
	David S. Miller, Shuah Khan, linux-doc, linux-kselftest,
	Daniel Borkmann

On Wed, Nov 04, 2020 at 06:36:08PM +0100, Paolo Abeni wrote:
> On Tue, 2020-11-03 at 08:52 -0800, Jakub Kicinski wrote:
> > On Tue, 03 Nov 2020 16:22:07 +0100 Paolo Abeni wrote:
> > > The relevant use case is an host running containers (with the related
> > > orchestration tools) in a RT environment. Virtual devices (veths, ovs
> > > ports, etc.) are created by the orchestration tools at run-time.
> > > Critical processes are allowed to send packets/generate outgoing
> > > network traffic - but any interrupt is moved away from the related
> > > cores, so that usual incoming network traffic processing does not
> > > happen there.
> > > 
> > > Still an xmit operation on a virtual devices may be transmitted via ovs
> > > or veth, with the relevant forwarding operation happening in a softirq
> > > on the same CPU originating the packet. 
> > > 
> > > RPS is configured (even) on such virtual devices to move away the
> > > forwarding from the relevant CPUs.
> > > 
> > > As Saeed noted, such configuration could be possibly performed via some
> > > user-space daemon monitoring network devices and network namespaces
> > > creation. That will be anyway prone to some race: the orchestation tool
> > > may create and enable the netns and virtual devices before the daemon
> > > has properly set the RPS mask.
> > > 
> > > In the latter scenario some packet forwarding could still slip in the
> > > relevant CPU, causing measurable latency. In all non RT scenarios the
> > > above will be likely irrelevant, but in the RT context that is not
> > > acceptable - e.g. it causes in real environments latency above the
> > > defined limits, while the proposed patches avoid the issue.
> > > 
> > > Do you see any other simple way to avoid the above race?
> > > 
> > > Please let me know if the above answers your doubts,
> > 
> > Thanks, that makes it clearer now.
> > 
> > Depending on how RT-aware your container management is it may or may not
> > be the right place to configure this, as it creates the veth interface.
> > Presumably it's the container management which does the placement of
> > the tasks to cores, why is it not setting other attributes, like RPS?
> 
> The container orchestration is quite complex, and I'm unsure isolation
> and networking configuration are performed (or can be performed) by the
> same precess (without an heavy refactor).

Also for the host side (no containers) the same issue will have to be handled 
for PCI hotplug for example. So this fix  will have to be performed in 
every tool that decides to create a network device (while a kernel
solution is global).

> On the flip hand, the global rps mask knob looked quite
> straightforward to me.
> 
> Possibly I can reduce the amount of new code introduced by this
> patchset removing some code duplication
> between rps_default_mask_sysctl() and flow_limit_cpu_sysctl(). Would
> that make this change more acceptable? Or should I drop this
> altogether?
> 
> > Also I wonder if it would make sense to turn this knob into something
> > more generic. When we arrive at the threaded NAPIs - could it make
> > sense for the threads to inherit your mask as the CPUs they are allowed
> > to run on?
> 
> I personally *think* this would be fine - and good. But isn't a bit
> premature discussing the integration of 2 missing pieces ? :)
> 
> Thanks,
> 
> Paolo

About the potential race:

0) network device creation starts, inherits old default_rps_mask,
network device init sleeps
1) set default_rps_mask (new) 
2) change all devices across all network namespaces (walk /sys)
3) network device init wakes up, new device shows up in /sys/ using old default_rps_mask

Why this can't happen?





^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 0/3] net: introduce rps_default_mask
  2020-11-04 17:36       ` Paolo Abeni
  2020-11-04 18:18         ` Marcelo Tosatti
@ 2020-11-04 19:42         ` Jakub Kicinski
  2023-01-30  9:25           ` Paolo Abeni
  1 sibling, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2020-11-04 19:42 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Saeed Mahameed, netdev, Jonathan Corbet, David S. Miller,
	Shuah Khan, linux-doc, linux-kselftest, Marcelo Tosatti,
	Daniel Borkmann

On Wed, 04 Nov 2020 18:36:08 +0100 Paolo Abeni wrote:
> On Tue, 2020-11-03 at 08:52 -0800, Jakub Kicinski wrote:
> > On Tue, 03 Nov 2020 16:22:07 +0100 Paolo Abeni wrote:  
> > > The relevant use case is an host running containers (with the related
> > > orchestration tools) in a RT environment. Virtual devices (veths, ovs
> > > ports, etc.) are created by the orchestration tools at run-time.
> > > Critical processes are allowed to send packets/generate outgoing
> > > network traffic - but any interrupt is moved away from the related
> > > cores, so that usual incoming network traffic processing does not
> > > happen there.
> > > 
> > > Still an xmit operation on a virtual devices may be transmitted via ovs
> > > or veth, with the relevant forwarding operation happening in a softirq
> > > on the same CPU originating the packet. 
> > > 
> > > RPS is configured (even) on such virtual devices to move away the
> > > forwarding from the relevant CPUs.
> > > 
> > > As Saeed noted, such configuration could be possibly performed via some
> > > user-space daemon monitoring network devices and network namespaces
> > > creation. That will be anyway prone to some race: the orchestation tool
> > > may create and enable the netns and virtual devices before the daemon
> > > has properly set the RPS mask.
> > > 
> > > In the latter scenario some packet forwarding could still slip in the
> > > relevant CPU, causing measurable latency. In all non RT scenarios the
> > > above will be likely irrelevant, but in the RT context that is not
> > > acceptable - e.g. it causes in real environments latency above the
> > > defined limits, while the proposed patches avoid the issue.
> > > 
> > > Do you see any other simple way to avoid the above race?
> > > 
> > > Please let me know if the above answers your doubts,  
> > 
> > Thanks, that makes it clearer now.
> > 
> > Depending on how RT-aware your container management is it may or may not
> > be the right place to configure this, as it creates the veth interface.
> > Presumably it's the container management which does the placement of
> > the tasks to cores, why is it not setting other attributes, like RPS?  
> 
> The container orchestration is quite complex, and I'm unsure isolation
> and networking configuration are performed (or can be performed) by the
> same precess (without an heavy refactor).
> 
> On the flip hand, the global rps mask knob looked quite
> straightforward to me.

I understand, but I can't shake the feeling this is a hack.

Whatever sets the CPU isolation should take care of the RPS settings.

> Possibly I can reduce the amount of new code introduced by this
> patchset removing some code duplication
> between rps_default_mask_sysctl() and flow_limit_cpu_sysctl(). Would
> that make this change more acceptable? Or should I drop this
> altogether?

I'm leaning towards drop altogether, unless you can get some
support/review tags from other netdev developers. So far it
appears we only got a down vote from Saeed.

> > Also I wonder if it would make sense to turn this knob into something
> > more generic. When we arrive at the threaded NAPIs - could it make
> > sense for the threads to inherit your mask as the CPUs they are allowed
> > to run on?  
> 
> I personally *think* this would be fine - and good. But isn't a bit
> premature discussing the integration of 2 missing pieces ? :)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 0/3] net: introduce rps_default_mask
  2020-11-04 19:42         ` Jakub Kicinski
@ 2023-01-30  9:25           ` Paolo Abeni
  2023-01-30 21:52             ` Jakub Kicinski
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Abeni @ 2023-01-30  9:25 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, netdev, Jonathan Corbet, David S. Miller,
	Shuah Khan, linux-doc, linux-kselftest, Marcelo Tosatti,
	Daniel Borkmann

Hi all,

On Wed, 2020-11-04 at 12:42 -0700, Jakub Kicinski wrote:
> On Wed, 04 Nov 2020 18:36:08 +0100 Paolo Abeni wrote:
> > On Tue, 2020-11-03 at 08:52 -0800, Jakub Kicinski wrote:
> > > On Tue, 03 Nov 2020 16:22:07 +0100 Paolo Abeni wrote:  
> > > > The relevant use case is an host running containers (with the related
> > > > orchestration tools) in a RT environment. Virtual devices (veths, ovs
> > > > ports, etc.) are created by the orchestration tools at run-time.
> > > > Critical processes are allowed to send packets/generate outgoing
> > > > network traffic - but any interrupt is moved away from the related
> > > > cores, so that usual incoming network traffic processing does not
> > > > happen there.
> > > > 
> > > > Still an xmit operation on a virtual devices may be transmitted via ovs
> > > > or veth, with the relevant forwarding operation happening in a softirq
> > > > on the same CPU originating the packet. 
> > > > 
> > > > RPS is configured (even) on such virtual devices to move away the
> > > > forwarding from the relevant CPUs.
> > > > 
> > > > As Saeed noted, such configuration could be possibly performed via some
> > > > user-space daemon monitoring network devices and network namespaces
> > > > creation. That will be anyway prone to some race: the orchestation tool
> > > > may create and enable the netns and virtual devices before the daemon
> > > > has properly set the RPS mask.
> > > > 
> > > > In the latter scenario some packet forwarding could still slip in the
> > > > relevant CPU, causing measurable latency. In all non RT scenarios the
> > > > above will be likely irrelevant, but in the RT context that is not
> > > > acceptable - e.g. it causes in real environments latency above the
> > > > defined limits, while the proposed patches avoid the issue.
> > > > 
> > > > Do you see any other simple way to avoid the above race?
> > > > 
> > > > Please let me know if the above answers your doubts,  
> > > 
> > > Thanks, that makes it clearer now.
> > > 
> > > Depending on how RT-aware your container management is it may or may not
> > > be the right place to configure this, as it creates the veth interface.
> > > Presumably it's the container management which does the placement of
> > > the tasks to cores, why is it not setting other attributes, like RPS?  
> > 
> > The container orchestration is quite complex, and I'm unsure isolation
> > and networking configuration are performed (or can be performed) by the
> > same precess (without an heavy refactor).
> > 
> > On the flip hand, the global rps mask knob looked quite
> > straightforward to me.
> 
> I understand, but I can't shake the feeling this is a hack.
> 
> Whatever sets the CPU isolation should take care of the RPS settings.

Let me try for a moment to revive this old thread.

Tha series proposed a new sysctl know to implement a global/default rps
mask applying to all the network devices as a way to simplify some RT
setups. It has been rejected as the required task is doable in user-
space.

Currently the orchestration infrastructure does that, setting the per
device, per queue rps mask and CPU isolation.

The above leads to a side problem: when there are lot of netns/devices
with several queues, even a reasonably optimized user-space solution
takes a relevant amount of time to traverse the relevant sysfs dirs and
do I/O on them. Overall the additional time required is very
measurable, easily ranging in seconds.

The default_rps_mask would basically kill that overhead.

Is the above a suitable use case?

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 0/3] net: introduce rps_default_mask
  2023-01-30  9:25           ` Paolo Abeni
@ 2023-01-30 21:52             ` Jakub Kicinski
  0 siblings, 0 replies; 13+ messages in thread
From: Jakub Kicinski @ 2023-01-30 21:52 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Saeed Mahameed, netdev, Jonathan Corbet, David S. Miller,
	Shuah Khan, linux-doc, linux-kselftest, Marcelo Tosatti,
	Daniel Borkmann

On Mon, 30 Jan 2023 10:25:34 +0100 Paolo Abeni wrote:
> Let me try for a moment to revive this old thread.
> 
> Tha series proposed a new sysctl know to implement a global/default rps
> mask applying to all the network devices as a way to simplify some RT
> setups. It has been rejected as the required task is doable in user-
> space.
> 
> Currently the orchestration infrastructure does that, setting the per
> device, per queue rps mask and CPU isolation.
> 
> The above leads to a side problem: when there are lot of netns/devices
> with several queues, even a reasonably optimized user-space solution
> takes a relevant amount of time to traverse the relevant sysfs dirs and
> do I/O on them. Overall the additional time required is very
> measurable, easily ranging in seconds.
> 
> The default_rps_mask would basically kill that overhead.
> 
> Is the above a suitable use case?

Alright, thanks for trying the user space fix.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-01-30 21:52 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-30 11:16 [PATCH net-next v2 0/3] net: introduce rps_default_mask Paolo Abeni
2020-10-30 11:16 ` [PATCH net-next v2 1/3] net/sysctl: factor-out netdev_rx_queue_set_rps_mask() helper Paolo Abeni
2020-10-30 11:16 ` [PATCH net-next v2 2/3] net/core: introduce default_rps_mask netns attribute Paolo Abeni
2020-10-30 11:16 ` [PATCH net-next v2 3/3] self-tests: introduce self-tests for RPS default mask Paolo Abeni
2020-11-02 22:54 ` [PATCH net-next v2 0/3] net: introduce rps_default_mask Jakub Kicinski
2020-11-02 23:27   ` Saeed Mahameed
2020-11-03 15:22   ` Paolo Abeni
2020-11-03 16:52     ` Jakub Kicinski
2020-11-04 17:36       ` Paolo Abeni
2020-11-04 18:18         ` Marcelo Tosatti
2020-11-04 19:42         ` Jakub Kicinski
2023-01-30  9:25           ` Paolo Abeni
2023-01-30 21:52             ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).