All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/2] Make neighbor eviction controllable by userspace
@ 2021-10-21  0:32 James Prestwood
  2021-10-21  0:32 ` [PATCH v5 1/2] net: arp: introduce arp_evict_nocarrier sysctl parameter James Prestwood
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: James Prestwood @ 2021-10-21  0:32 UTC (permalink / raw)
  To: netdev
  Cc: James Prestwood, David S . Miller, Jakub Kicinski,
	Jonathan Corbet, Hideaki YOSHIFUJI, David Ahern, Roopa Prabhu,
	Daniel Borkmann, Vladimir Oltean, Ido Schimmel,
	Nikolay Aleksandrov, Chinmay Agarwal, Yajun Deng, Tong Zhu,
	Johannes Berg, Jouni Malinen

v1 -> v2:

 - It was suggested by Daniel Borkmann to extend the neighbor table settings
   rather than adding IPv4/IPv6 options for ARP/NDISC separately. I agree
   this way is much more concise since there is now only one place where the
   option is checked and defined.
 - Moved documentation/code into the same patch
 - Explained in more detail the test scenario and results

v2 -> v3:

 - Renamed 'skip_perm' to 'nocarrier'. The way this parameter is used
   matches this naming.
 - Changed logic to still flush if 'nocarrier' is false.

v3 -> v4:

 - Moved NDTPA_EVICT_NOCARRIER after NDTPA_PAD

v4 -> v5:

 - Went back to the original v1 patchset and changed:
 - Used ANDCONF for IN_DEV macro
 - Got RCU lock prior to __in_dev_get_rcu(). Do note that the logic
   here was extended to handle if __in_dev_get_rcu() fails. If this
   happens the existing behavior should be maintained and set the
   carrier down. I'm unsure if get_rcu() can fail in this context
   though. Similar logic was used for in6_dev_get.
 - Changed ndisc_evict_nocarrier to use a u8, proper handler, and
   set min/max values.


Cc: David S. Miller <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Roopa Prabhu <roopa@nvidia.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
Cc: Ido Schimmel <idosch@nvidia.com>
Cc: Nikolay Aleksandrov <nikolay@nvidia.com>
Cc: Chinmay Agarwal <chinagar@codeaurora.org>
Cc: Yajun Deng <yajun.deng@linux.dev>
Cc: Tong Zhu <zhutong@amazon.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jouni Malinen <jouni@codeaurora.org>

James Prestwood (2):
  net: arp: introduce arp_evict_nocarrier sysctl parameter
  net: ndisc: introduce ndisc_evict_nocarrier sysctl parameter

 Documentation/networking/ip-sysctl.rst | 18 ++++++++++++++++++
 include/linux/inetdevice.h             |  2 ++
 include/linux/ipv6.h                   |  1 +
 include/uapi/linux/ip.h                |  1 +
 include/uapi/linux/ipv6.h              |  1 +
 include/uapi/linux/sysctl.h            |  1 +
 net/ipv4/arp.c                         | 13 ++++++++++++-
 net/ipv4/devinet.c                     |  4 ++++
 net/ipv6/addrconf.c                    | 12 ++++++++++++
 net/ipv6/ndisc.c                       |  5 ++++-
 10 files changed, 56 insertions(+), 2 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v5 1/2] net: arp: introduce arp_evict_nocarrier sysctl parameter
  2021-10-21  0:32 [PATCH v5 0/2] Make neighbor eviction controllable by userspace James Prestwood
@ 2021-10-21  0:32 ` James Prestwood
  2021-10-21 14:08   ` David Ahern
  2021-10-21  0:32 ` [PATCH v5 2/2] net: ndisc: introduce ndisc_evict_nocarrier " James Prestwood
  2021-10-21  2:33 ` [PATCH v5 0/2] Make neighbor eviction controllable by userspace David Ahern
  2 siblings, 1 reply; 8+ messages in thread
From: James Prestwood @ 2021-10-21  0:32 UTC (permalink / raw)
  To: netdev; +Cc: James Prestwood

This change introduces a new sysctl parameter, arp_evict_nocarrier.
When set (default) the ARP cache will be cleared on a NOCARRIER event.
This new option has been defaulted to '1' which maintains existing
behavior.

Clearing the ARP cache on NOCARRIER is relatively new, introduced by:

commit 859bd2ef1fc1110a8031b967ee656c53a6260a76
Author: David Ahern <dsahern@gmail.com>
Date:   Thu Oct 11 20:33:49 2018 -0700

    net: Evict neighbor entries on carrier down

The reason for this changes is to prevent the ARP cache from being
cleared when a wireless device roams. Specifically for wireless roams
the ARP cache should not be cleared because the underlying network has not
changed. Clearing the ARP cache in this case can introduce significant
delays sending out packets after a roam.

A user reported such a situation here:

https://lore.kernel.org/linux-wireless/CACsRnHWa47zpx3D1oDq9JYnZWniS8yBwW1h0WAVZ6vrbwL_S0w@mail.gmail.com/

After some investigation it was found that the kernel was holding onto
packets until ARP finished which resulted in this 1 second delay. It
was also found that the first ARP who-has was never responded to,
which is actually what caues the delay. This change is more or less
working around this behavior, but again, there is no reason to clear
the cache on a roam anyways.

As for the unanswered who-has, we know the packet made it OTA since
it was seen while monitoring. Why it never received a response is
unknown. In any case, since this is a problem on the AP side of things
all that can be done is to work around it until it is solved.

Some background on testing/reproducing the packet delay:

Hardware:
 - 2 access points configured for Fast BSS Transition (Though I don't
   see why regular reassociation wouldn't have the same behavior)
 - Wireless station running IWD as supplicant
 - A device on network able to respond to pings (I used one of the APs)

Procedure:
 - Connect to first AP
 - Ping once to establish an ARP entry
 - Start a tcpdump
 - Roam to second AP
 - Wait for operstate UP event, and note the timestamp
 - Start pinging

Results:

Below is the tcpdump after UP. It was recorded the interface went UP at
10:42:01.432875.

10:42:01.461871 ARP, Request who-has 192.168.254.1 tell 192.168.254.71, length 28
10:42:02.497976 ARP, Request who-has 192.168.254.1 tell 192.168.254.71, length 28
10:42:02.507162 ARP, Reply 192.168.254.1 is-at ac:86:74:55:b0:20, length 46
10:42:02.507185 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 1, length 64
10:42:02.507205 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 2, length 64
10:42:02.507212 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 3, length 64
10:42:02.507219 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 4, length 64
10:42:02.507225 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 5, length 64
10:42:02.507232 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 6, length 64
10:42:02.515373 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 1, length 64
10:42:02.521399 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 2, length 64
10:42:02.521612 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 3, length 64
10:42:02.521941 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 4, length 64
10:42:02.522419 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 5, length 64
10:42:02.523085 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 6, length 64

You can see the first ARP who-has went out very quickly after UP, but
was never responded to. Nearly a second later the kernel retries and
gets a response. Only then do the ping packets go out. If an ARP entry
is manually added prior to UP (after the cache is cleared) it is seen
that the first ping is never responded to, so its not only an issue with
ARP but with data packets in general.

As mentioned prior, the wireless interface was also monitored to verify
the ping/ARP packet made it OTA which was observed to be true.

Signed-off-by: James Prestwood <prestwoj@gmail.com>
---
 Documentation/networking/ip-sysctl.rst |  9 +++++++++
 include/linux/inetdevice.h             |  2 ++
 include/uapi/linux/ip.h                |  1 +
 include/uapi/linux/sysctl.h            |  1 +
 net/ipv4/arp.c                         | 13 ++++++++++++-
 net/ipv4/devinet.c                     |  4 ++++
 6 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index 16b8bf72feaf..18fde4ed7a5e 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -1611,6 +1611,15 @@ arp_accept - BOOLEAN
 	gratuitous arp frame, the arp table will be updated regardless
 	if this setting is on or off.
 
+arp_evict_nocarrier - BOOLEAN
+	Clears the ARP cache on NOCARRIER events. This option is important for
+	wireless devices where the ARP cache should not be cleared when roaming
+	between access points on the same network. In most cases this should
+	remain as the default (1).
+
+	- 1 - (default): Clear the ARP cache on NOCARRIER events
+	- 0 - Do not clear ARP cache on NOCARRIER events
+
 mcast_solicit - INTEGER
 	The maximum number of multicast probes in INCOMPLETE state,
 	when the associated hardware address is unknown.  Defaults
diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index a038feb63f23..518b484a7f07 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -133,6 +133,8 @@ static inline void ipv4_devconf_setall(struct in_device *in_dev)
 #define IN_DEV_ARP_ANNOUNCE(in_dev)	IN_DEV_MAXCONF((in_dev), ARP_ANNOUNCE)
 #define IN_DEV_ARP_IGNORE(in_dev)	IN_DEV_MAXCONF((in_dev), ARP_IGNORE)
 #define IN_DEV_ARP_NOTIFY(in_dev)	IN_DEV_MAXCONF((in_dev), ARP_NOTIFY)
+#define IN_DEV_ARP_EVICT_NOCARRIER(in_dev) IN_DEV_ANDCONF((in_dev), \
+							  ARP_EVICT_NOCARRIER)
 
 struct in_ifaddr {
 	struct hlist_node	hash;
diff --git a/include/uapi/linux/ip.h b/include/uapi/linux/ip.h
index e42d13b55cf3..e00bbb9c47bb 100644
--- a/include/uapi/linux/ip.h
+++ b/include/uapi/linux/ip.h
@@ -169,6 +169,7 @@ enum
 	IPV4_DEVCONF_DROP_UNICAST_IN_L2_MULTICAST,
 	IPV4_DEVCONF_DROP_GRATUITOUS_ARP,
 	IPV4_DEVCONF_BC_FORWARDING,
+	IPV4_DEVCONF_ARP_EVICT_NOCARRIER,
 	__IPV4_DEVCONF_MAX
 };
 
diff --git a/include/uapi/linux/sysctl.h b/include/uapi/linux/sysctl.h
index 1e05d3caa712..6a3b194c50fe 100644
--- a/include/uapi/linux/sysctl.h
+++ b/include/uapi/linux/sysctl.h
@@ -482,6 +482,7 @@ enum
 	NET_IPV4_CONF_PROMOTE_SECONDARIES=20,
 	NET_IPV4_CONF_ARP_ACCEPT=21,
 	NET_IPV4_CONF_ARP_NOTIFY=22,
+	NET_IPV4_CONF_ARP_EVICT_NOCARRIER=23,
 };
 
 /* /proc/sys/net/ipv4/netfilter */
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 922dd73e5740..4d3b8d1463b7 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -1247,6 +1247,8 @@ static int arp_netdev_event(struct notifier_block *this, unsigned long event,
 {
 	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
 	struct netdev_notifier_change_info *change_info;
+	struct in_device *in_dev;
+	bool evict_nocarrier;
 
 	switch (event) {
 	case NETDEV_CHANGEADDR:
@@ -1257,7 +1259,16 @@ static int arp_netdev_event(struct notifier_block *this, unsigned long event,
 		change_info = ptr;
 		if (change_info->flags_changed & IFF_NOARP)
 			neigh_changeaddr(&arp_tbl, dev);
-		if (!netif_carrier_ok(dev))
+
+		rcu_read_lock();
+		in_dev = __in_dev_get_rcu(dev);
+		if (!in_dev)
+			evict_nocarrier = true;
+		else
+			evict_nocarrier = IN_DEV_ARP_EVICT_NOCARRIER(in_dev);
+		rcu_read_unlock();
+
+		if (evict_nocarrier && !netif_carrier_ok(dev))
 			neigh_carrier_down(&arp_tbl, dev);
 		break;
 	default:
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index f4468980b675..ec73a0d52d3e 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -75,6 +75,7 @@ static struct ipv4_devconf ipv4_devconf = {
 		[IPV4_DEVCONF_SHARED_MEDIA - 1] = 1,
 		[IPV4_DEVCONF_IGMPV2_UNSOLICITED_REPORT_INTERVAL - 1] = 10000 /*ms*/,
 		[IPV4_DEVCONF_IGMPV3_UNSOLICITED_REPORT_INTERVAL - 1] =  1000 /*ms*/,
+		[IPV4_DEVCONF_ARP_EVICT_NOCARRIER - 1] = 1,
 	},
 };
 
@@ -87,6 +88,7 @@ static struct ipv4_devconf ipv4_devconf_dflt = {
 		[IPV4_DEVCONF_ACCEPT_SOURCE_ROUTE - 1] = 1,
 		[IPV4_DEVCONF_IGMPV2_UNSOLICITED_REPORT_INTERVAL - 1] = 10000 /*ms*/,
 		[IPV4_DEVCONF_IGMPV3_UNSOLICITED_REPORT_INTERVAL - 1] =  1000 /*ms*/,
+		[IPV4_DEVCONF_ARP_EVICT_NOCARRIER - 1] = 1,
 	},
 };
 
@@ -2532,6 +2534,8 @@ static struct devinet_sysctl_table {
 		DEVINET_SYSCTL_RW_ENTRY(ARP_IGNORE, "arp_ignore"),
 		DEVINET_SYSCTL_RW_ENTRY(ARP_ACCEPT, "arp_accept"),
 		DEVINET_SYSCTL_RW_ENTRY(ARP_NOTIFY, "arp_notify"),
+		DEVINET_SYSCTL_RW_ENTRY(ARP_EVICT_NOCARRIER,
+					"arp_evict_nocarrier"),
 		DEVINET_SYSCTL_RW_ENTRY(PROXY_ARP_PVLAN, "proxy_arp_pvlan"),
 		DEVINET_SYSCTL_RW_ENTRY(FORCE_IGMP_VERSION,
 					"force_igmp_version"),
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v5 2/2] net: ndisc: introduce ndisc_evict_nocarrier sysctl parameter
  2021-10-21  0:32 [PATCH v5 0/2] Make neighbor eviction controllable by userspace James Prestwood
  2021-10-21  0:32 ` [PATCH v5 1/2] net: arp: introduce arp_evict_nocarrier sysctl parameter James Prestwood
@ 2021-10-21  0:32 ` James Prestwood
  2021-10-21 14:15   ` David Ahern
  2021-10-21  2:33 ` [PATCH v5 0/2] Make neighbor eviction controllable by userspace David Ahern
  2 siblings, 1 reply; 8+ messages in thread
From: James Prestwood @ 2021-10-21  0:32 UTC (permalink / raw)
  To: netdev; +Cc: James Prestwood

In most situations the neighbor discovery cache should be cleared on a
NOCARRIER event which is currently done unconditionally. But for wireless
roams the neighbor discovery cache can and should remain intact since
the underlying network has not changed.

This patch introduces a sysctl option ndisc_evict_nocarrier which can
be disabled by a wireless supplicant during a roam. This allows packets
to be sent after a roam immediately without having to wait for
neighbor discovery.

A user reported roughly a 1 second delay after a roam before packets
could be sent out (note, on IPv4). This delay was due to the ARP
cache being cleared. During testing of this same scenario using IPv6
no delay was noticed, but regardless there is no reason to clear
the ndisc cache for wireless roams.

Signed-off-by: James Prestwood <prestwoj@gmail.com>
---
 Documentation/networking/ip-sysctl.rst |  9 +++++++++
 include/linux/ipv6.h                   |  1 +
 include/uapi/linux/ipv6.h              |  1 +
 net/ipv6/addrconf.c                    | 12 ++++++++++++
 net/ipv6/ndisc.c                       |  9 ++++++++-
 5 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index 18fde4ed7a5e..c61cc0219f4c 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -2350,6 +2350,15 @@ ndisc_tclass - INTEGER
 
 	* 0 - (default)
 
+ndisc_evict_nocarrier - BOOLEAN
+	Clears the neighbor discovery table on NOCARRIER events. This option is
+	important for wireless devices where the neighbor discovery cache should
+	not be cleared when roaming between access points on the same network.
+	In most cases this should remain as the default (1).
+
+	- 1 - (default): Clear neighbor discover cache on NOCARRIER events.
+	- 0 - Do not clear neighbor discovery cache on NOCARRIER events.
+
 mldv1_unsolicited_report_interval - INTEGER
 	The interval in milliseconds in which the next unsolicited
 	MLDv1 report retransmit will take place.
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index ef4a69865737..753e5c0db2a3 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -79,6 +79,7 @@ struct ipv6_devconf {
 	__u32		ioam6_id;
 	__u32		ioam6_id_wide;
 	__u8		ioam6_enabled;
+	__u8		ndisc_evict_nocarrier;
 
 	struct ctl_table_header *sysctl_header;
 };
diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
index b243a53fa985..d4178dace0bf 100644
--- a/include/uapi/linux/ipv6.h
+++ b/include/uapi/linux/ipv6.h
@@ -193,6 +193,7 @@ enum {
 	DEVCONF_IOAM6_ENABLED,
 	DEVCONF_IOAM6_ID,
 	DEVCONF_IOAM6_ID_WIDE,
+	DEVCONF_NDISC_EVICT_NOCARRIER,
 	DEVCONF_MAX
 };
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index d4fae16deec4..398294aa8348 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -241,6 +241,7 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
 	.ioam6_enabled		= 0,
 	.ioam6_id               = IOAM6_DEFAULT_IF_ID,
 	.ioam6_id_wide		= IOAM6_DEFAULT_IF_ID_WIDE,
+	.ndisc_evict_nocarrier	= 1,
 };
 
 static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
@@ -300,6 +301,7 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
 	.ioam6_enabled		= 0,
 	.ioam6_id               = IOAM6_DEFAULT_IF_ID,
 	.ioam6_id_wide		= IOAM6_DEFAULT_IF_ID_WIDE,
+	.ndisc_evict_nocarrier	= 1,
 };
 
 /* Check if link is ready: is it up and is a valid qdisc available */
@@ -5542,6 +5544,7 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf,
 	array[DEVCONF_IOAM6_ENABLED] = cnf->ioam6_enabled;
 	array[DEVCONF_IOAM6_ID] = cnf->ioam6_id;
 	array[DEVCONF_IOAM6_ID_WIDE] = cnf->ioam6_id_wide;
+	array[DEVCONF_NDISC_EVICT_NOCARRIER] = cnf->ndisc_evict_nocarrier;
 }
 
 static inline size_t inet6_ifla6_size(void)
@@ -6983,6 +6986,15 @@ static const struct ctl_table addrconf_sysctl[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_douintvec,
 	},
+	{
+		.procname	= "ndisc_evict_nocarrier",
+		.data		= &ipv6_devconf.ndisc_evict_nocarrier,
+		.maxlen		= sizeof(u8),
+		.mode		= 0644,
+		.proc_handler	= proc_dou8vec_minmax,
+		.extra1		= (void *)SYSCTL_ZERO,
+		.extra2		= (void *)SYSCTL_ONE,
+	},
 	{
 		/* sentinel */
 	}
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 184190b9ea25..4db58c29ab53 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1794,6 +1794,7 @@ static int ndisc_netdev_event(struct notifier_block *this, unsigned long event,
 	struct netdev_notifier_change_info *change_info;
 	struct net *net = dev_net(dev);
 	struct inet6_dev *idev;
+	bool evict_nocarrier;
 
 	switch (event) {
 	case NETDEV_CHANGEADDR:
@@ -1810,10 +1811,16 @@ static int ndisc_netdev_event(struct notifier_block *this, unsigned long event,
 		in6_dev_put(idev);
 		break;
 	case NETDEV_CHANGE:
+		idev = in6_dev_get(dev);
+		if (!idev)
+			evict_nocarrier = true;
+		else
+			evict_nocarrier = idev->cnf.ndisc_evict_nocarrier;
+
 		change_info = ptr;
 		if (change_info->flags_changed & IFF_NOARP)
 			neigh_changeaddr(&nd_tbl, dev);
-		if (!netif_carrier_ok(dev))
+		if (evict_nocarrier && !netif_carrier_ok(dev))
 			neigh_carrier_down(&nd_tbl, dev);
 		break;
 	case NETDEV_DOWN:
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 0/2] Make neighbor eviction controllable by userspace
  2021-10-21  0:32 [PATCH v5 0/2] Make neighbor eviction controllable by userspace James Prestwood
  2021-10-21  0:32 ` [PATCH v5 1/2] net: arp: introduce arp_evict_nocarrier sysctl parameter James Prestwood
  2021-10-21  0:32 ` [PATCH v5 2/2] net: ndisc: introduce ndisc_evict_nocarrier " James Prestwood
@ 2021-10-21  2:33 ` David Ahern
  2021-10-21 21:59   ` James Prestwood
  2 siblings, 1 reply; 8+ messages in thread
From: David Ahern @ 2021-10-21  2:33 UTC (permalink / raw)
  To: James Prestwood, netdev
  Cc: David S . Miller, Jakub Kicinski, Jonathan Corbet,
	Hideaki YOSHIFUJI, David Ahern, Roopa Prabhu, Daniel Borkmann,
	Vladimir Oltean, Ido Schimmel, Nikolay Aleksandrov,
	Chinmay Agarwal, Yajun Deng, Tong Zhu, Johannes Berg,
	Jouni Malinen

On 10/20/21 6:32 PM, James Prestwood wrote:
> v1 -> v2:
> 
>  - It was suggested by Daniel Borkmann to extend the neighbor table settings
>    rather than adding IPv4/IPv6 options for ARP/NDISC separately. I agree
>    this way is much more concise since there is now only one place where the
>    option is checked and defined.
>  - Moved documentation/code into the same patch
>  - Explained in more detail the test scenario and results
> 
> v2 -> v3:
> 
>  - Renamed 'skip_perm' to 'nocarrier'. The way this parameter is used
>    matches this naming.
>  - Changed logic to still flush if 'nocarrier' is false.
> 
> v3 -> v4:
> 
>  - Moved NDTPA_EVICT_NOCARRIER after NDTPA_PAD
> 
> v4 -> v5:
> 
>  - Went back to the original v1 patchset and changed:
>  - Used ANDCONF for IN_DEV macro
>  - Got RCU lock prior to __in_dev_get_rcu(). Do note that the logic
>    here was extended to handle if __in_dev_get_rcu() fails. If this
>    happens the existing behavior should be maintained and set the
>    carrier down. I'm unsure if get_rcu() can fail in this context
>    though. Similar logic was used for in6_dev_get.
>  - Changed ndisc_evict_nocarrier to use a u8, proper handler, and
>    set min/max values.
> 

I'll take a deep dive on the patches tomorrow.

You need to add a selftests script under tools/testing/selftests/net
that shows this behavior with the new setting set and unset. This is
easily done with veth pairs and network namespaces (one end of the veth
pair down sets the other into no-carrier). Take a look at the scripts
there - e.g., fib_nexthops.sh should provide a template for a start point.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 1/2] net: arp: introduce arp_evict_nocarrier sysctl parameter
  2021-10-21  0:32 ` [PATCH v5 1/2] net: arp: introduce arp_evict_nocarrier sysctl parameter James Prestwood
@ 2021-10-21 14:08   ` David Ahern
  0 siblings, 0 replies; 8+ messages in thread
From: David Ahern @ 2021-10-21 14:08 UTC (permalink / raw)
  To: James Prestwood, netdev

On 10/20/21 6:32 PM, James Prestwood wrote:
> diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
> index 922dd73e5740..4d3b8d1463b7 100644
> --- a/net/ipv4/arp.c
> +++ b/net/ipv4/arp.c
> @@ -1257,7 +1259,16 @@ static int arp_netdev_event(struct notifier_block *this, unsigned long event,
>  		change_info = ptr;
>  		if (change_info->flags_changed & IFF_NOARP)
>  			neigh_changeaddr(&arp_tbl, dev);
> -		if (!netif_carrier_ok(dev))
> +
> +		rcu_read_lock();
> +		in_dev = __in_dev_get_rcu(dev);
> +		if (!in_dev)
> +			evict_nocarrier = true;
> +		else
> +			evict_nocarrier = IN_DEV_ARP_EVICT_NOCARRIER(in_dev);
> +		rcu_read_unlock();

I believe the rtnl lock is held, so you can use __in_dev_get_rtnl and
avoid the rcu_read_{,un}lock.

> +
> +		if (evict_nocarrier && !netif_carrier_ok(dev))
>  			neigh_carrier_down(&arp_tbl, dev);
>  		break;
>  	default:


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 2/2] net: ndisc: introduce ndisc_evict_nocarrier sysctl parameter
  2021-10-21  0:32 ` [PATCH v5 2/2] net: ndisc: introduce ndisc_evict_nocarrier " James Prestwood
@ 2021-10-21 14:15   ` David Ahern
  0 siblings, 0 replies; 8+ messages in thread
From: David Ahern @ 2021-10-21 14:15 UTC (permalink / raw)
  To: James Prestwood, netdev

On 10/20/21 6:32 PM, James Prestwood wrote:
> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
> index 184190b9ea25..4db58c29ab53 100644
> --- a/net/ipv6/ndisc.c
> +++ b/net/ipv6/ndisc.c
> @@ -1810,10 +1811,16 @@ static int ndisc_netdev_event(struct notifier_block *this, unsigned long event,
>  		in6_dev_put(idev);
>  		break;
>  	case NETDEV_CHANGE:
> +		idev = in6_dev_get(dev);
> +		if (!idev)
> +			evict_nocarrier = true;
> +		else
> +			evict_nocarrier = idev->cnf.ndisc_evict_nocarrier;
> +

missing in6_dev_put here


>  		change_info = ptr;
>  		if (change_info->flags_changed & IFF_NOARP)
>  			neigh_changeaddr(&nd_tbl, dev);
> -		if (!netif_carrier_ok(dev))
> +		if (evict_nocarrier && !netif_carrier_ok(dev))
>  			neigh_carrier_down(&nd_tbl, dev);
>  		break;
>  	case NETDEV_DOWN:
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 0/2] Make neighbor eviction controllable by userspace
  2021-10-21  2:33 ` [PATCH v5 0/2] Make neighbor eviction controllable by userspace David Ahern
@ 2021-10-21 21:59   ` James Prestwood
  2021-10-21 22:09     ` David Ahern
  0 siblings, 1 reply; 8+ messages in thread
From: James Prestwood @ 2021-10-21 21:59 UTC (permalink / raw)
  To: David Ahern, netdev
  Cc: David S . Miller, Jakub Kicinski, Jonathan Corbet,
	Hideaki YOSHIFUJI, David Ahern, Roopa Prabhu, Daniel Borkmann,
	Vladimir Oltean, Ido Schimmel, Nikolay Aleksandrov,
	Chinmay Agarwal, Yajun Deng, Tong Zhu, Johannes Berg,
	Jouni Malinen

Hi David,

On Wed, 2021-10-20 at 20:33 -0600, David Ahern wrote:
> On 10/20/21 6:32 PM, James Prestwood wrote:
> > v1 -> v2:
> > 
> >  - It was suggested by Daniel Borkmann to extend the neighbor table
> > settings
> >    rather than adding IPv4/IPv6 options for ARP/NDISC separately. I
> > agree
> >    this way is much more concise since there is now only one place
> > where the
> >    option is checked and defined.
> >  - Moved documentation/code into the same patch
> >  - Explained in more detail the test scenario and results
> > 
> > v2 -> v3:
> > 
> >  - Renamed 'skip_perm' to 'nocarrier'. The way this parameter is
> > used
> >    matches this naming.
> >  - Changed logic to still flush if 'nocarrier' is false.
> > 
> > v3 -> v4:
> > 
> >  - Moved NDTPA_EVICT_NOCARRIER after NDTPA_PAD
> > 
> > v4 -> v5:
> > 
> >  - Went back to the original v1 patchset and changed:
> >  - Used ANDCONF for IN_DEV macro
> >  - Got RCU lock prior to __in_dev_get_rcu(). Do note that the logic
> >    here was extended to handle if __in_dev_get_rcu() fails. If this
> >    happens the existing behavior should be maintained and set the
> >    carrier down. I'm unsure if get_rcu() can fail in this context
> >    though. Similar logic was used for in6_dev_get.
> >  - Changed ndisc_evict_nocarrier to use a u8, proper handler, and
> >    set min/max values.
> > 
> 
> I'll take a deep dive on the patches tomorrow.
> 
> You need to add a selftests script under tools/testing/selftests/net
> that shows this behavior with the new setting set and unset. This is
> easily done with veth pairs and network namespaces (one end of the
> veth
> pair down sets the other into no-carrier). Take a look at the scripts
> there - e.g., fib_nexthops.sh should provide a template for a start
> point.

So the test itself is pretty simple. The part I'm unsure about is how
you actually set the carrier state from userspace. I see "ip link set
<dev> carrier {on,off}" but this reports not supported for veths,
wlans, and eth interfaces I have tried. AFAIK the driver controls the
carrier state. Maybe some drivers do support this?

Is there a way to set the carrier state that you, or anyone is aware
of?

Thanks,
James


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v5 0/2] Make neighbor eviction controllable by userspace
  2021-10-21 21:59   ` James Prestwood
@ 2021-10-21 22:09     ` David Ahern
  0 siblings, 0 replies; 8+ messages in thread
From: David Ahern @ 2021-10-21 22:09 UTC (permalink / raw)
  To: James Prestwood, netdev
  Cc: David S . Miller, Jakub Kicinski, Jonathan Corbet,
	Hideaki YOSHIFUJI, David Ahern, Roopa Prabhu, Daniel Borkmann,
	Vladimir Oltean, Ido Schimmel, Nikolay Aleksandrov,
	Chinmay Agarwal, Yajun Deng, Tong Zhu, Johannes Berg,
	Jouni Malinen

On 10/21/21 3:59 PM, James Prestwood wrote:
> So the test itself is pretty simple. The part I'm unsure about is how
> you actually set the carrier state from userspace. I see "ip link set
> <dev> carrier {on,off}" but this reports not supported for veths,
> wlans, and eth interfaces I have tried. AFAIK the driver controls the
> carrier state. Maybe some drivers do support this?
> 
> Is there a way to set the carrier state that you, or anyone is aware
> of?

with veth pairs, if you set one down the other goes into no-carrier mode

make veth0 the one you are validating and put veth1 in a namespace
called testing. Then 'ip -netns testing li set veth1 down' put veth0 in
NO_CARRIER

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-10-21 22:09 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-21  0:32 [PATCH v5 0/2] Make neighbor eviction controllable by userspace James Prestwood
2021-10-21  0:32 ` [PATCH v5 1/2] net: arp: introduce arp_evict_nocarrier sysctl parameter James Prestwood
2021-10-21 14:08   ` David Ahern
2021-10-21  0:32 ` [PATCH v5 2/2] net: ndisc: introduce ndisc_evict_nocarrier " James Prestwood
2021-10-21 14:15   ` David Ahern
2021-10-21  2:33 ` [PATCH v5 0/2] Make neighbor eviction controllable by userspace David Ahern
2021-10-21 21:59   ` James Prestwood
2021-10-21 22:09     ` David Ahern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.