linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] net: route: improve route hinting
@ 2024-03-07 17:11 Leone Fernando
  2024-03-07 17:11 ` [PATCH net-next 1/4] net: route: expire rt if the dst it holds is expired Leone Fernando
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Leone Fernando @ 2024-03-07 17:11 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemb
  Cc: netdev, linux-kernel, Leone Fernando

In 2017, Paolo Abeni introduced the hinting mechanism [1] to the routing
sub-system. The hinting optimization improves performance by reusing
previously found dsts instead of looking them up for each skb.

This patch series introduces a generalized version of the hinting mechanism that
can "remember" a larger number of dsts. This reduces the number of dst
lookups for frequently encountered daddrs.

Before diving into the code and the benchmarking results, it's important
to address the deletion of the old route cache [2] and why
this solution is different. The original cache was complicated,
vulnerable to DOS attacks and had unstable performance.

The new input dst_cache is much simpler thanks to its lazy approach,
improving performance without the overhead of the removed cache
implementation. Instead of using timers and GC, the deletion of invalid
entries is performed lazily during their lookups.
The dsts are stored in a simple, lightweight, static hash table. This
keeps the lookup times fast yet stable, preventing DOS upon cache misses.
The new input dst_cache implementation is built over the existing
dst_cache code which supplies a fast lockless percpu behavior.

I tested this patch using udp floods with different number of daddrs.
The benchmarking setup is comprised of 3 machines: a sender,
a forwarder and a receiver. I measured the PPS received by the receiver
as the forwarder was running either the mainline kernel or the patched
kernel, comparing the results. The dst_cache I tested in this benchmark
used a total of 512 hash table entries, split into buckets of 4
entries each.

These are the results:
  UDP             mainline              patched                   delta
conns pcpu         Kpps                  Kpps                       %
   1              274.0255              269.2205                  -1.75
   2              257.3748              268.0947                   4.17
  15              241.3513              258.8016                   7.23
 100              238.3419              258.4939                   8.46
 500              238.5390              252.6425                   5.91
1000              238.7570              242.1820                   1.43
2000              238.7780              236.2640                  -1.05
4000              239.0440              233.5320                  -2.31
8000              239.3248              232.5680                  -2.82

As you can see, this patch improves performance up until ~1500
connections, after which the rate of improvement diminishes
due to the growing number of cache misses.
It's important to note that in the worst scenario, every packet will
cause a cache miss, resulting in only a constant performance degradation
due to the fixed cache and bucket sizes. This means that the cache is
resistant to DOS attacks.

Based on the above measurements, it seems that the performance
degradation flattens at around 3%. Note that the number of concurrent
connections at which performance starts to degrade depends on the cache
size and the amount of cpus.

[1] https://lore.kernel.org/netdev/cover.1574252982.git.pabeni@redhat.com/
[2] https://lore.kernel.org/netdev/20120720.142502.1144557295933737451.davem@davemloft.net/

v1:
- fix typo while allocating per-cpu cache
- while using dst from the dst_cache set IPSKB_DOREDIRECT correctly
- always compile dst_cache

RFC-v2:
- remove unnecessary macro
- move inline to .h file

RFC-v1: https://lore.kernel.org/netdev/d951b371-4138-4bda-a1c5-7606a28c81f0@gmail.com/
RFC-v2: https://lore.kernel.org/netdev/3a17c86d-08a5-46d2-8622-abc13d4a411e@gmail.com/

Leone Fernando (4):
  net: route: expire rt if the dst it holds is expired
  net: dst_cache: add input_dst_cache API
  net: route: always compile dst_cache
  net: route: replace route hints with input_dst_cache

 drivers/net/Kconfig        |   1 -
 include/net/dst_cache.h    |  68 +++++++++++++++++++
 include/net/dst_metadata.h |   2 -
 include/net/ip_tunnels.h   |   2 -
 include/net/route.h        |   6 +-
 net/Kconfig                |   4 --
 net/core/Makefile          |   3 +-
 net/core/dst.c             |   4 --
 net/core/dst_cache.c       | 132 +++++++++++++++++++++++++++++++++++++
 net/ipv4/Kconfig           |   1 -
 net/ipv4/ip_input.c        |  58 ++++++++--------
 net/ipv4/ip_tunnel_core.c  |   4 --
 net/ipv4/route.c           |  75 +++++++++++++++------
 net/ipv4/udp_tunnel_core.c |   4 --
 net/ipv6/Kconfig           |   4 --
 net/ipv6/ip6_udp_tunnel.c  |   4 --
 net/netfilter/nft_tunnel.c |   2 -
 net/openvswitch/Kconfig    |   1 -
 net/sched/act_tunnel_key.c |   2 -
 19 files changed, 291 insertions(+), 86 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net-next 1/4] net: route: expire rt if the dst it holds is expired
  2024-03-07 17:11 [PATCH net-next 0/4] net: route: improve route hinting Leone Fernando
@ 2024-03-07 17:11 ` Leone Fernando
  2024-03-07 17:12 ` [PATCH net-next 2/4] net: dst_cache: add input_dst_cache API Leone Fernando
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Leone Fernando @ 2024-03-07 17:11 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemb
  Cc: netdev, linux-kernel, Leone Fernando

The function rt_is_expired is used to verify that a cached dst is valid.
Currently, this function ignores the rt.dst->expires value.

Add a check to rt_is_expired that validates that the dst is not expired.

Signed-off-by: Leone Fernando <leone4fernando@gmail.com>
---
 net/ipv4/route.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index c8f76f56dc16..73e742424e97 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -392,7 +392,8 @@ static inline int ip_rt_proc_init(void)
 
 static inline bool rt_is_expired(const struct rtable *rth)
 {
-	return rth->rt_genid != rt_genid_ipv4(dev_net(rth->dst.dev));
+	return rth->rt_genid != rt_genid_ipv4(dev_net(rth->dst.dev)) ||
+	       (rth->dst.expires && time_after(jiffies, rth->dst.expires));
 }
 
 void rt_cache_flush(struct net *net)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 2/4] net: dst_cache: add input_dst_cache API
  2024-03-07 17:11 [PATCH net-next 0/4] net: route: improve route hinting Leone Fernando
  2024-03-07 17:11 ` [PATCH net-next 1/4] net: route: expire rt if the dst it holds is expired Leone Fernando
@ 2024-03-07 17:12 ` Leone Fernando
  2024-03-09  3:55   ` Jakub Kicinski
  2024-03-07 17:12 ` [PATCH net-next 3/4] net: route: always compile dst_cache Leone Fernando
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Leone Fernando @ 2024-03-07 17:12 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemb
  Cc: netdev, linux-kernel, Leone Fernando

The input_dst_cache allows fast lookup of frequently encountered dsts.

In order to provide stable results, I implemented a simple linear
hashtable with each bucket containing a constant amount of
entries (DST_CACHE_INPUT_BUCKET_SIZE).

Similarly to how the route hint is used, I defined the hashtable key to
contain the daddr and the tos of the IP header.

Lookup is performed in a straightforward manner: start at the bucket head
corresponding the hashed key and search the following
DST_CACHE_INPUT_BUCKET_SIZE entries of the array for a matching key.

When inserting a new dst to the cache, if all the bucket entries are
full, the oldest one is deleted to make room for the new dst.

Signed-off-by: Leone Fernando <leone4fernando@gmail.com>
---
 include/net/dst_cache.h |  68 +++++++++++++++++++++
 net/core/dst_cache.c    | 132 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 200 insertions(+)

diff --git a/include/net/dst_cache.h b/include/net/dst_cache.h
index df6622a5fe98..1c3b06f08d7d 100644
--- a/include/net/dst_cache.h
+++ b/include/net/dst_cache.h
@@ -7,12 +7,40 @@
 #if IS_ENABLED(CONFIG_IPV6)
 #include <net/ip6_fib.h>
 #endif
+#include <net/ip.h>
+
+#define DST_CACHE_INPUT_SHIFT (9)
+#define DST_CACHE_INPUT_SIZE (1 << DST_CACHE_INPUT_SHIFT)
+#define DST_CACHE_INPUT_BUCKET_SIZE (4)
+#define DST_CACHE_INPUT_HASH_MASK (~(DST_CACHE_INPUT_BUCKET_SIZE - 1))
+#define INVALID_DST_CACHE_INPUT_KEY (~(u64)(0))
 
 struct dst_cache {
 	struct dst_cache_pcpu __percpu *cache;
 	unsigned long reset_ts;
 };
 
+extern unsigned int dst_cache_net_id __read_mostly;
+
+/**
+ * idst_for_each_in_bucket - iterate over a dst cache bucket
+ * @pos:	the type * to use as a loop cursor
+ * @head:	the head of the cpu dst cache.
+ * @hash:	the hash of the bucket
+ */
+#define idst_for_each_in_bucket(pos, head, hash)		\
+	for (pos = &head[hash];					\
+	     pos < &head[hash + DST_CACHE_INPUT_BUCKET_SIZE];	\
+	     pos++)
+
+/**
+ * idst_for_each_in_cache - iterate over the dst cache
+ * @pos:	the type * to use as a loop cursor
+ * @head:	the head of the cpu dst cache.
+ */
+#define idst_for_each_in_cache(pos, head)				\
+	for (pos = head; pos < head + DST_CACHE_INPUT_SIZE; pos++)
+
 /**
  *	dst_cache_get - perform cache lookup
  *	@dst_cache: the cache
@@ -106,4 +134,44 @@ int dst_cache_init(struct dst_cache *dst_cache, gfp_t gfp);
  */
 void dst_cache_destroy(struct dst_cache *dst_cache);
 
+/**
+ *	dst_cache_input_get_noref - perform lookup in the input cache,
+ *	return a noref dst
+ *	@dst_cache: the input cache
+ *	@skb: the packet according to which the dst entry will be searched
+ *	local BH must be disabled.
+ */
+struct dst_entry *dst_cache_input_get_noref(struct dst_cache *dst_cache,
+					    struct sk_buff *skb);
+
+/**
+ *	dst_cache_input_add - add the dst of the given skb to the input cache.
+ *
+ *	in case the cache bucket is full, the oldest entry will be deleted
+ *	and replaced with the new one.
+ *	@dst_cache: the input cache
+ *	@skb: The packet according to which the dst entry will be searched
+ *
+ *	local BH must be disabled.
+ */
+void dst_cache_input_add(struct dst_cache *dst_cache,
+			 const struct sk_buff *skb);
+
+/**
+ *	dst_cache_input_init - initialize the input cache,
+ *	allocating the required storage
+ */
+int __init dst_cache_input_init(void);
+
+static inline u64 create_dst_cache_key_ip4(const struct sk_buff *skb)
+{
+	struct iphdr *iphdr = ip_hdr(skb);
+
+	return (((u64)iphdr->daddr) << 8) | iphdr->tos;
+}
+
+static inline u32 hash_dst_cache_key(u64 key)
+{
+	return hash_64(key, DST_CACHE_INPUT_SHIFT) & DST_CACHE_INPUT_HASH_MASK;
+}
 #endif
diff --git a/net/core/dst_cache.c b/net/core/dst_cache.c
index 0ccfd5fa5cb9..245f5546a7ff 100644
--- a/net/core/dst_cache.c
+++ b/net/core/dst_cache.c
@@ -13,6 +13,7 @@
 #include <net/ip6_fib.h>
 #endif
 #include <uapi/linux/in.h>
+#include <net/netns/generic.h>
 
 struct dst_cache_pcpu {
 	unsigned long refresh_ts;
@@ -21,9 +22,12 @@ struct dst_cache_pcpu {
 	union {
 		struct in_addr in_saddr;
 		struct in6_addr in6_saddr;
+		u64 key;
 	};
 };
 
+unsigned int dst_cache_net_id __read_mostly;
+
 static void dst_cache_per_cpu_dst_set(struct dst_cache_pcpu *dst_cache,
 				      struct dst_entry *dst, u32 cookie)
 {
@@ -181,3 +185,131 @@ void dst_cache_reset_now(struct dst_cache *dst_cache)
 	}
 }
 EXPORT_SYMBOL_GPL(dst_cache_reset_now);
+
+static void dst_cache_input_set(struct dst_cache_pcpu *idst,
+				struct dst_entry *dst, u64 key)
+{
+	dst_cache_per_cpu_dst_set(idst, dst, 0);
+	idst->key = key;
+	idst->refresh_ts = jiffies;
+}
+
+static struct dst_entry *__dst_cache_input_get_noref(struct dst_cache_pcpu *idst)
+{
+	struct dst_entry *dst = idst->dst;
+
+	if (unlikely(dst->obsolete && !dst->ops->check(dst, idst->cookie))) {
+		dst_cache_input_set(idst, NULL, INVALID_DST_CACHE_INPUT_KEY);
+		goto fail;
+	}
+
+	idst->refresh_ts = jiffies;
+	return dst;
+
+fail:
+	return NULL;
+}
+
+struct dst_entry *dst_cache_input_get_noref(struct dst_cache *dst_cache,
+					    struct sk_buff *skb)
+{
+	struct dst_entry *out_dst = NULL;
+	struct dst_cache_pcpu *pcpu_cache;
+	struct dst_cache_pcpu *idst;
+	u32 hash;
+	u64 key;
+
+	pcpu_cache = this_cpu_ptr(dst_cache->cache);
+	key = create_dst_cache_key_ip4(skb);
+	hash = hash_dst_cache_key(key);
+	idst_for_each_in_bucket(idst, pcpu_cache, hash) {
+		if (key == idst->key) {
+			out_dst = __dst_cache_input_get_noref(idst);
+			goto out;
+		}
+	}
+out:
+	return out_dst;
+}
+
+static void dst_cache_input_reset_now(struct dst_cache *dst_cache)
+{
+	struct dst_cache_pcpu *caches;
+	struct dst_cache_pcpu *idst;
+	struct dst_entry *dst;
+	int i;
+
+	for_each_possible_cpu(i) {
+		caches = per_cpu_ptr(dst_cache->cache, i);
+		idst_for_each_in_cache(idst, caches) {
+			idst->key = INVALID_DST_CACHE_INPUT_KEY;
+			dst = idst->dst;
+			if (dst)
+				dst_release(dst);
+		}
+	}
+}
+
+static int __net_init dst_cache_input_net_init(struct net *net)
+{
+	struct dst_cache *dst_cache = net_generic(net, dst_cache_net_id);
+
+	dst_cache->cache = (struct dst_cache_pcpu __percpu *)alloc_percpu_gfp(struct dst_cache_pcpu[DST_CACHE_INPUT_SIZE],
+									      GFP_KERNEL | __GFP_ZERO);
+	if (!dst_cache->cache)
+		return -ENOMEM;
+
+	dst_cache_input_reset_now(dst_cache);
+	return 0;
+}
+
+static void __net_exit dst_cache_input_net_exit(struct net *net)
+{
+	struct dst_cache *dst_cache = net_generic(net, dst_cache_net_id);
+
+	dst_cache_input_reset_now(dst_cache);
+	free_percpu(dst_cache->cache);
+	dst_cache->cache = NULL;
+}
+
+static bool idst_empty(struct dst_cache_pcpu *idst)
+{
+	return idst->key == INVALID_DST_CACHE_INPUT_KEY;
+}
+
+void dst_cache_input_add(struct dst_cache *dst_cache, const struct sk_buff *skb)
+{
+	struct dst_cache_pcpu *entry = NULL;
+	struct dst_cache_pcpu *pcpu_cache;
+	struct dst_cache_pcpu *idst;
+	u32 hash;
+	u64 key;
+
+	pcpu_cache = this_cpu_ptr(dst_cache->cache);
+	key = create_dst_cache_key_ip4(skb);
+	hash = hash_dst_cache_key(key);
+	idst_for_each_in_bucket(idst, pcpu_cache, hash) {
+		if (idst_empty(idst)) {
+			entry = idst;
+			goto add_to_cache;
+		}
+		if (!entry || time_before(idst->refresh_ts, entry->refresh_ts))
+			entry = idst;
+	}
+
+add_to_cache:
+	dst_cache_input_set(entry, skb_dst(skb), key);
+}
+
+static struct pernet_operations dst_cache_input_ops __net_initdata = {
+	.init = dst_cache_input_net_init,
+	.exit = dst_cache_input_net_exit,
+	.id   = &dst_cache_net_id,
+	.size = sizeof(struct dst_cache),
+};
+
+int __init dst_cache_input_init(void)
+{
+	return register_pernet_subsys(&dst_cache_input_ops);
+}
+subsys_initcall(dst_cache_input_init);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 3/4] net: route: always compile dst_cache
  2024-03-07 17:11 [PATCH net-next 0/4] net: route: improve route hinting Leone Fernando
  2024-03-07 17:11 ` [PATCH net-next 1/4] net: route: expire rt if the dst it holds is expired Leone Fernando
  2024-03-07 17:12 ` [PATCH net-next 2/4] net: dst_cache: add input_dst_cache API Leone Fernando
@ 2024-03-07 17:12 ` Leone Fernando
  2024-03-07 17:12 ` [PATCH net-next 4/4] net: route: replace route hints with input_dst_cache Leone Fernando
  2024-03-09  4:53 ` [PATCH net-next 0/4] net: route: improve route hinting David Ahern
  4 siblings, 0 replies; 13+ messages in thread
From: Leone Fernando @ 2024-03-07 17:12 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemb
  Cc: netdev, linux-kernel, Leone Fernando

make dst_cache always compile, delete all relevant ifdefs

Signed-off-by: Leone Fernando <leone4fernando@gmail.com>
---
 drivers/net/Kconfig        | 1 -
 include/net/dst_metadata.h | 2 --
 include/net/ip_tunnels.h   | 2 --
 net/Kconfig                | 4 ----
 net/core/Makefile          | 3 +--
 net/core/dst.c             | 4 ----
 net/ipv4/Kconfig           | 1 -
 net/ipv4/ip_tunnel_core.c  | 4 ----
 net/ipv4/udp_tunnel_core.c | 4 ----
 net/ipv6/Kconfig           | 4 ----
 net/ipv6/ip6_udp_tunnel.c  | 4 ----
 net/netfilter/nft_tunnel.c | 2 --
 net/openvswitch/Kconfig    | 1 -
 net/sched/act_tunnel_key.c | 2 --
 14 files changed, 1 insertion(+), 37 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 8ca0bc223b30..1be1ec8368b6 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -78,7 +78,6 @@ config WIREGUARD
 	depends on IPV6 || !IPV6
 	depends on !KMSAN # KMSAN doesn't support the crypto configs below
 	select NET_UDP_TUNNEL
-	select DST_CACHE
 	select CRYPTO
 	select CRYPTO_LIB_CURVE25519
 	select CRYPTO_LIB_CHACHA20POLY1305
diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
index 1b7fae4c6b24..c41a857bf0ed 100644
--- a/include/net/dst_metadata.h
+++ b/include/net/dst_metadata.h
@@ -165,7 +165,6 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
 
 	memcpy(&new_md->u.tun_info, &md_dst->u.tun_info,
 	       sizeof(struct ip_tunnel_info) + md_size);
-#ifdef CONFIG_DST_CACHE
 	/* Unclone the dst cache if there is one */
 	if (new_md->u.tun_info.dst_cache.cache) {
 		int ret;
@@ -176,7 +175,6 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
 			return ERR_PTR(ret);
 		}
 	}
-#endif
 
 	skb_dst_drop(skb);
 	skb_dst_set(skb, &new_md->dst);
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 5cd64bb2104d..2fe04edc23b9 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -84,9 +84,7 @@ struct ip_tunnel_encap {
 struct ip_tunnel_info {
 	struct ip_tunnel_key	key;
 	struct ip_tunnel_encap	encap;
-#ifdef CONFIG_DST_CACHE
 	struct dst_cache	dst_cache;
-#endif
 	u8			options_len;
 	u8			mode;
 };
diff --git a/net/Kconfig b/net/Kconfig
index 3e57ccf0da27..21caffd1758d 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -438,10 +438,6 @@ config LWTUNNEL_BPF
 	  Allows to run BPF programs as a nexthop action following a route
 	  lookup for incoming and outgoing packets.
 
-config DST_CACHE
-	bool
-	default n
-
 config GRO_CELLS
 	bool
 	default n
diff --git a/net/core/Makefile b/net/core/Makefile
index 821aec06abf1..53582fef633b 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -13,7 +13,7 @@ obj-y		     += dev.o dev_addr_lists.o dst.o netevent.o \
 			neighbour.o rtnetlink.o utils.o link_watch.o filter.o \
 			sock_diag.o dev_ioctl.o tso.o sock_reuseport.o \
 			fib_notifier.o xdp.o flow_offload.o gro.o \
-			netdev-genl.o netdev-genl-gen.o gso.o
+			netdev-genl.o netdev-genl-gen.o gso.o dst_cache.o
 
 obj-$(CONFIG_NETDEV_ADDR_LIST_TEST) += dev_addr_lists_test.o
 
@@ -32,7 +32,6 @@ obj-$(CONFIG_CGROUP_NET_PRIO) += netprio_cgroup.o
 obj-$(CONFIG_CGROUP_NET_CLASSID) += netclassid_cgroup.o
 obj-$(CONFIG_LWTUNNEL) += lwtunnel.o
 obj-$(CONFIG_LWTUNNEL_BPF) += lwt_bpf.o
-obj-$(CONFIG_DST_CACHE) += dst_cache.o
 obj-$(CONFIG_HWBM) += hwbm.o
 obj-$(CONFIG_GRO_CELLS) += gro_cells.o
 obj-$(CONFIG_FAILOVER) += failover.o
diff --git a/net/core/dst.c b/net/core/dst.c
index 95f533844f17..f035c39be104 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -291,10 +291,8 @@ EXPORT_SYMBOL_GPL(metadata_dst_alloc);
 
 void metadata_dst_free(struct metadata_dst *md_dst)
 {
-#ifdef CONFIG_DST_CACHE
 	if (md_dst->type == METADATA_IP_TUNNEL)
 		dst_cache_destroy(&md_dst->u.tun_info.dst_cache);
-#endif
 	if (md_dst->type == METADATA_XFRM)
 		dst_release(md_dst->u.xfrm_info.dst_orig);
 	kfree(md_dst);
@@ -326,10 +324,8 @@ void metadata_dst_free_percpu(struct metadata_dst __percpu *md_dst)
 	for_each_possible_cpu(cpu) {
 		struct metadata_dst *one_md_dst = per_cpu_ptr(md_dst, cpu);
 
-#ifdef CONFIG_DST_CACHE
 		if (one_md_dst->type == METADATA_IP_TUNNEL)
 			dst_cache_destroy(&one_md_dst->u.tun_info.dst_cache);
-#endif
 		if (one_md_dst->type == METADATA_XFRM)
 			dst_release(one_md_dst->u.xfrm_info.dst_orig);
 	}
diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index 8e94ed7c56a0..189f716b03e8 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -185,7 +185,6 @@ config NET_IPGRE_DEMUX
 
 config NET_IP_TUNNEL
 	tristate
-	select DST_CACHE
 	select GRO_CELLS
 	default n
 
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 80ccd6661aa3..f060d55dc249 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -682,13 +682,11 @@ static int ip_tun_build_state(struct net *net, struct nlattr *attr,
 		return err;
 	}
 
-#ifdef CONFIG_DST_CACHE
 	err = dst_cache_init(&tun_info->dst_cache, GFP_KERNEL);
 	if (err) {
 		lwtstate_free(new_state);
 		return err;
 	}
-#endif
 
 	if (tb[LWTUNNEL_IP_ID])
 		tun_info->key.tun_id = nla_get_be64(tb[LWTUNNEL_IP_ID]);
@@ -720,11 +718,9 @@ static int ip_tun_build_state(struct net *net, struct nlattr *attr,
 
 static void ip_tun_destroy_state(struct lwtunnel_state *lwtstate)
 {
-#ifdef CONFIG_DST_CACHE
 	struct ip_tunnel_info *tun_info = lwt_tun_info(lwtstate);
 
 	dst_cache_destroy(&tun_info->dst_cache);
-#endif
 }
 
 static int ip_tun_fill_encap_opts_geneve(struct sk_buff *skb,
diff --git a/net/ipv4/udp_tunnel_core.c b/net/ipv4/udp_tunnel_core.c
index 860aff5f8599..ecc0990d8cab 100644
--- a/net/ipv4/udp_tunnel_core.c
+++ b/net/ipv4/udp_tunnel_core.c
@@ -215,13 +215,11 @@ struct rtable *udp_tunnel_dst_lookup(struct sk_buff *skb,
 	struct rtable *rt = NULL;
 	struct flowi4 fl4;
 
-#ifdef CONFIG_DST_CACHE
 	if (dst_cache) {
 		rt = dst_cache_get_ip4(dst_cache, saddr);
 		if (rt)
 			return rt;
 	}
-#endif
 
 	memset(&fl4, 0, sizeof(fl4));
 	fl4.flowi4_mark = skb->mark;
@@ -244,10 +242,8 @@ struct rtable *udp_tunnel_dst_lookup(struct sk_buff *skb,
 		ip_rt_put(rt);
 		return ERR_PTR(-ELOOP);
 	}
-#ifdef CONFIG_DST_CACHE
 	if (dst_cache)
 		dst_cache_set_ip4(dst_cache, &rt->dst, fl4.saddr);
-#endif
 	*saddr = fl4.saddr;
 	return rt;
 }
diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index 08d4b7132d4c..093c768d41ab 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -124,7 +124,6 @@ config IPV6_MIP6
 config IPV6_ILA
 	tristate "IPv6: Identifier Locator Addressing (ILA)"
 	depends on NETFILTER
-	select DST_CACHE
 	select LWTUNNEL
 	help
 	  Support for IPv6 Identifier Locator Addressing (ILA).
@@ -203,7 +202,6 @@ config IPV6_NDISC_NODETYPE
 config IPV6_TUNNEL
 	tristate "IPv6: IP-in-IPv6 tunnel (RFC2473)"
 	select INET6_TUNNEL
-	select DST_CACHE
 	select GRO_CELLS
 	help
 	  Support for IPv6-in-IPv6 and IPv4-in-IPv6 tunnels described in
@@ -291,7 +289,6 @@ config IPV6_SEG6_LWTUNNEL
 	bool "IPv6: Segment Routing Header encapsulation support"
 	depends on IPV6
 	select LWTUNNEL
-	select DST_CACHE
 	select IPV6_MULTIPLE_TABLES
 	help
 	  Support for encapsulation of packets within an outer IPv6
@@ -333,7 +330,6 @@ config IPV6_IOAM6_LWTUNNEL
 	bool "IPv6: IOAM Pre-allocated Trace insertion support"
 	depends on IPV6
 	select LWTUNNEL
-	select DST_CACHE
 	help
 	  Support for the insertion of IOAM Pre-allocated Trace
 	  Header using the lightweight tunnels mechanism.
diff --git a/net/ipv6/ip6_udp_tunnel.c b/net/ipv6/ip6_udp_tunnel.c
index c99053189ea8..de92aea01cfc 100644
--- a/net/ipv6/ip6_udp_tunnel.c
+++ b/net/ipv6/ip6_udp_tunnel.c
@@ -145,13 +145,11 @@ struct dst_entry *udp_tunnel6_dst_lookup(struct sk_buff *skb,
 	struct dst_entry *dst = NULL;
 	struct flowi6 fl6;
 
-#ifdef CONFIG_DST_CACHE
 	if (dst_cache) {
 		dst = dst_cache_get_ip6(dst_cache, saddr);
 		if (dst)
 			return dst;
 	}
-#endif
 	memset(&fl6, 0, sizeof(fl6));
 	fl6.flowi6_mark = skb->mark;
 	fl6.flowi6_proto = IPPROTO_UDP;
@@ -173,10 +171,8 @@ struct dst_entry *udp_tunnel6_dst_lookup(struct sk_buff *skb,
 		dst_release(dst);
 		return ERR_PTR(-ELOOP);
 	}
-#ifdef CONFIG_DST_CACHE
 	if (dst_cache)
 		dst_cache_set_ip6(dst_cache, dst, &fl6.saddr);
-#endif
 	*saddr = fl6.saddr;
 	return dst;
 }
diff --git a/net/netfilter/nft_tunnel.c b/net/netfilter/nft_tunnel.c
index f735d79d8be5..02cbd7f3f6dc 100644
--- a/net/netfilter/nft_tunnel.c
+++ b/net/netfilter/nft_tunnel.c
@@ -508,13 +508,11 @@ static int nft_tunnel_obj_init(const struct nft_ctx *ctx,
 		return -ENOMEM;
 
 	memcpy(&md->u.tun_info, &info, sizeof(info));
-#ifdef CONFIG_DST_CACHE
 	err = dst_cache_init(&md->u.tun_info.dst_cache, GFP_KERNEL);
 	if (err < 0) {
 		metadata_dst_free(md);
 		return err;
 	}
-#endif
 	ip_tunnel_info_opts_set(&md->u.tun_info, &priv->opts.u, priv->opts.len,
 				priv->opts.flags);
 	priv->md = md;
diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig
index 29a7081858cd..b7a5ab6374b8 100644
--- a/net/openvswitch/Kconfig
+++ b/net/openvswitch/Kconfig
@@ -13,7 +13,6 @@ config OPENVSWITCH
 	select LIBCRC32C
 	select MPLS
 	select NET_MPLS_GSO
-	select DST_CACHE
 	select NET_NSH
 	select NF_CONNTRACK_OVS if NF_CONNTRACK
 	select NF_NAT_OVS if NF_NAT
diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c
index 1536f8b16f1b..1da69fa82512 100644
--- a/net/sched/act_tunnel_key.c
+++ b/net/sched/act_tunnel_key.c
@@ -476,11 +476,9 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla,
 			goto err_out;
 		}
 
-#ifdef CONFIG_DST_CACHE
 		ret = dst_cache_init(&metadata->u.tun_info.dst_cache, GFP_KERNEL);
 		if (ret)
 			goto release_tun_meta;
-#endif
 
 		if (opts_len) {
 			ret = tunnel_key_opts_set(tb[TCA_TUNNEL_KEY_ENC_OPTS],
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 4/4] net: route: replace route hints with input_dst_cache
  2024-03-07 17:11 [PATCH net-next 0/4] net: route: improve route hinting Leone Fernando
                   ` (2 preceding siblings ...)
  2024-03-07 17:12 ` [PATCH net-next 3/4] net: route: always compile dst_cache Leone Fernando
@ 2024-03-07 17:12 ` Leone Fernando
  2024-03-09  4:53 ` [PATCH net-next 0/4] net: route: improve route hinting David Ahern
  4 siblings, 0 replies; 13+ messages in thread
From: Leone Fernando @ 2024-03-07 17:12 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, willemb
  Cc: netdev, linux-kernel, Leone Fernando

Replace route hints with cached dsts - ip_rcv_finish_core will first try
to use the cache and only then fall back to the demux or perform a full
lookup.

Only add newly found dsts to the cache after all the checks have passed
successfully to avoid adding a dropped packet's dst to the cache.

Multicast dsts are not added to the dst_cache as it will require additional
checks and multicast packets are rarer and a slower path anyway.

A check was added to ip_route_use_dst_cache that prevents forwarding
packets received by devices for which forwarding is disabled.

Relevant checks were added to ip_route_use_dst_cache to make sure the
dst can be used and to ensure IPCB(skb) flags are correct.

Signed-off-by: Leone Fernando <leone4fernando@gmail.com>
---
 include/net/route.h |  6 ++--
 net/ipv4/ip_input.c | 58 +++++++++++++++++++-----------------
 net/ipv4/route.c    | 72 +++++++++++++++++++++++++++++++++------------
 3 files changed, 88 insertions(+), 48 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index d4a0147942f1..0940383719a0 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -186,9 +186,9 @@ int ip_mc_validate_source(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 			  struct in_device *in_dev, u32 *itag);
 int ip_route_input_noref(struct sk_buff *skb, __be32 dst, __be32 src,
 			 u8 tos, struct net_device *devin);
-int ip_route_use_hint(struct sk_buff *skb, __be32 dst, __be32 src,
-		      u8 tos, struct net_device *devin,
-		      const struct sk_buff *hint);
+int ip_route_use_dst_cache(struct sk_buff *skb, __be32 daddr, __be32 saddr,
+			   u8 tos, struct net_device *dev,
+			   struct dst_entry *dst);
 
 static inline int ip_route_input(struct sk_buff *skb, __be32 dst, __be32 src,
 				 u8 tos, struct net_device *devin)
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index 5e9c8156656a..35c8b122d62f 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -305,30 +305,44 @@ static inline bool ip_rcv_options(struct sk_buff *skb, struct net_device *dev)
 	return true;
 }
 
-static bool ip_can_use_hint(const struct sk_buff *skb, const struct iphdr *iph,
-			    const struct sk_buff *hint)
+static bool ip_can_add_dst_cache(struct sk_buff *skb, __u16 rt_type)
 {
-	return hint && !skb_dst(skb) && ip_hdr(hint)->daddr == iph->daddr &&
-	       ip_hdr(hint)->tos == iph->tos;
+	return skb_valid_dst(skb) &&
+	       rt_type != RTN_BROADCAST &&
+	       rt_type != RTN_MULTICAST &&
+	       !(IPCB(skb)->flags & IPSKB_MULTIPATH);
+}
+
+static bool ip_can_use_dst_cache(const struct net *net, struct sk_buff *skb)
+{
+	return !skb_dst(skb) && !fib4_has_custom_rules(net);
 }
 
 int tcp_v4_early_demux(struct sk_buff *skb);
 int udp_v4_early_demux(struct sk_buff *skb);
 static int ip_rcv_finish_core(struct net *net, struct sock *sk,
-			      struct sk_buff *skb, struct net_device *dev,
-			      const struct sk_buff *hint)
+			      struct sk_buff *skb, struct net_device *dev)
 {
+	struct dst_cache *dst_cache = net_generic(net, dst_cache_net_id);
 	const struct iphdr *iph = ip_hdr(skb);
+	struct dst_entry *dst;
 	int err, drop_reason;
 	struct rtable *rt;
+	bool do_cache;
 
 	drop_reason = SKB_DROP_REASON_NOT_SPECIFIED;
 
-	if (ip_can_use_hint(skb, iph, hint)) {
-		err = ip_route_use_hint(skb, iph->daddr, iph->saddr, iph->tos,
-					dev, hint);
-		if (unlikely(err))
-			goto drop_error;
+	do_cache = ip_can_use_dst_cache(net, skb);
+	if (do_cache) {
+		dst = dst_cache_input_get_noref(dst_cache, skb);
+		if (dst) {
+			err = ip_route_use_dst_cache(skb, iph->daddr,
+						     iph->saddr, iph->tos,
+						     dev, dst);
+			if (unlikely(err))
+				goto drop_error;
+			do_cache = false;
+		}
 	}
 
 	if (READ_ONCE(net->ipv4.sysctl_ip_early_demux) &&
@@ -418,6 +432,9 @@ static int ip_rcv_finish_core(struct net *net, struct sock *sk,
 		}
 	}
 
+	if (do_cache && ip_can_add_dst_cache(skb, rt->rt_type))
+		dst_cache_input_add(dst_cache, skb);
+
 	return NET_RX_SUCCESS;
 
 drop:
@@ -444,7 +461,7 @@ static int ip_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
 	if (!skb)
 		return NET_RX_SUCCESS;
 
-	ret = ip_rcv_finish_core(net, sk, skb, dev, NULL);
+	ret = ip_rcv_finish_core(net, sk, skb, dev);
 	if (ret != NET_RX_DROP)
 		ret = dst_input(skb);
 	return ret;
@@ -581,21 +598,11 @@ static void ip_sublist_rcv_finish(struct list_head *head)
 	}
 }
 
-static struct sk_buff *ip_extract_route_hint(const struct net *net,
-					     struct sk_buff *skb, int rt_type)
-{
-	if (fib4_has_custom_rules(net) || rt_type == RTN_BROADCAST ||
-	    IPCB(skb)->flags & IPSKB_MULTIPATH)
-		return NULL;
-
-	return skb;
-}
-
 static void ip_list_rcv_finish(struct net *net, struct sock *sk,
 			       struct list_head *head)
 {
-	struct sk_buff *skb, *next, *hint = NULL;
 	struct dst_entry *curr_dst = NULL;
+	struct sk_buff *skb, *next;
 	struct list_head sublist;
 
 	INIT_LIST_HEAD(&sublist);
@@ -610,14 +617,11 @@ static void ip_list_rcv_finish(struct net *net, struct sock *sk,
 		skb = l3mdev_ip_rcv(skb);
 		if (!skb)
 			continue;
-		if (ip_rcv_finish_core(net, sk, skb, dev, hint) == NET_RX_DROP)
+		if (ip_rcv_finish_core(net, sk, skb, dev) == NET_RX_DROP)
 			continue;
 
 		dst = skb_dst(skb);
 		if (curr_dst != dst) {
-			hint = ip_extract_route_hint(net, skb,
-					       ((struct rtable *)dst)->rt_type);
-
 			/* dispatch old sublist */
 			if (!list_empty(&sublist))
 				ip_sublist_rcv_finish(&sublist);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 73e742424e97..e7de683f7b49 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1787,6 +1787,24 @@ static void ip_handle_martian_source(struct net_device *dev,
 #endif
 }
 
+static void ip_route_set_doredirect(struct in_device *in_dev,
+				    struct in_device *out_dev,
+				    struct sk_buff *skb,
+				    u8 gw_family,
+				    __be32 gw4,
+				    __be32 saddr)
+{
+	if (out_dev == in_dev && IN_DEV_TX_REDIRECTS(out_dev) &&
+	    skb->protocol == htons(ETH_P_IP)) {
+		__be32 gw;
+
+		gw = gw_family == AF_INET ? gw4 : 0;
+		if (IN_DEV_SHARED_MEDIA(out_dev) ||
+		    inet_addr_onlink(out_dev, saddr, gw))
+			IPCB(skb)->flags |= IPSKB_DOREDIRECT;
+	}
+}
+
 /* called in rcu_read_lock() section */
 static int __mkroute_input(struct sk_buff *skb,
 			   const struct fib_result *res,
@@ -1819,15 +1837,10 @@ static int __mkroute_input(struct sk_buff *skb,
 	}
 
 	do_cache = res->fi && !itag;
-	if (out_dev == in_dev && err && IN_DEV_TX_REDIRECTS(out_dev) &&
-	    skb->protocol == htons(ETH_P_IP)) {
-		__be32 gw;
-
-		gw = nhc->nhc_gw_family == AF_INET ? nhc->nhc_gw.ipv4 : 0;
-		if (IN_DEV_SHARED_MEDIA(out_dev) ||
-		    inet_addr_onlink(out_dev, saddr, gw))
-			IPCB(skb)->flags |= IPSKB_DOREDIRECT;
-	}
+	if (err)
+		ip_route_set_doredirect(in_dev, out_dev, skb,
+					nhc->nhc_gw_family,
+					nhc->nhc_gw.ipv4, saddr);
 
 	if (skb->protocol != htons(ETH_P_IP)) {
 		/* Not IP (i.e. ARP). Do not create route, if it is
@@ -2157,14 +2170,15 @@ static int ip_mkroute_input(struct sk_buff *skb,
 
 /* Implements all the saddr-related checks as ip_route_input_slow(),
  * assuming daddr is valid and the destination is not a local broadcast one.
- * Uses the provided hint instead of performing a route lookup.
+ * Uses the provided dst from dst_cache instead of performing a route lookup.
  */
-int ip_route_use_hint(struct sk_buff *skb, __be32 daddr, __be32 saddr,
-		      u8 tos, struct net_device *dev,
-		      const struct sk_buff *hint)
+int ip_route_use_dst_cache(struct sk_buff *skb, __be32 daddr, __be32 saddr,
+			   u8 tos, struct net_device *dev,
+			   struct dst_entry *dst)
 {
+	struct in_device *out_dev = __in_dev_get_rcu(dst->dev);
 	struct in_device *in_dev = __in_dev_get_rcu(dev);
-	struct rtable *rt = skb_rtable(hint);
+	struct rtable *rt = (struct rtable *)dst;
 	struct net *net = dev_net(dev);
 	int err = -EINVAL;
 	u32 tag = 0;
@@ -2178,21 +2192,43 @@ int ip_route_use_hint(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	if (ipv4_is_loopback(saddr) && !IN_DEV_NET_ROUTE_LOCALNET(in_dev, net))
 		goto martian_source;
 
-	if (rt->rt_type != RTN_LOCAL)
-		goto skip_validate_source;
+	if (ipv4_is_loopback(daddr) && !IN_DEV_NET_ROUTE_LOCALNET(in_dev, net))
+		goto martian_destination;
 
+	if (rt->rt_type != RTN_LOCAL) {
+		if (!IN_DEV_FORWARD(in_dev)) {
+			err = -EHOSTUNREACH;
+			goto out_err;
+		}
+		goto skip_validate_source;
+	}
 	tos &= IPTOS_RT_MASK;
 	err = fib_validate_source(skb, saddr, daddr, tos, 0, dev, in_dev, &tag);
 	if (err < 0)
 		goto martian_source;
 
+	if (err)
+		ip_route_set_doredirect(in_dev, out_dev, skb, rt->rt_gw_family,
+					rt->rt_gw4, saddr);
+
 skip_validate_source:
-	skb_dst_copy(skb, hint);
+	skb_dst_set_noref(skb, dst);
 	return 0;
 
 martian_source:
 	ip_handle_martian_source(dev, in_dev, skb, daddr, saddr);
+out_err:
 	return err;
+
+martian_destination:
+	RT_CACHE_STAT_INC(in_martian_dst);
+#ifdef CONFIG_IP_ROUTE_VERBOSE
+		if (IN_DEV_LOG_MARTIANS(in_dev))
+			net_warn_ratelimited("martian destination %pI4 from %pI4, dev %s\n",
+					     &daddr, &saddr, dev->name);
+#endif
+	err = -EINVAL;
+	goto out_err;
 }
 
 /* get device for dst_alloc with local routes */
@@ -2213,7 +2249,7 @@ static struct net_device *ip_rt_get_dev(struct net *net,
  *	addresses, because every properly looped back packet
  *	must have correct destination already attached by output routine.
  *	Changes in the enforced policies must be applied also to
- *	ip_route_use_hint().
+ *	ip_route_use_dst_cache().
  *
  *	Such approach solves two big problems:
  *	1. Not simplex devices are handled properly.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 2/4] net: dst_cache: add input_dst_cache API
  2024-03-07 17:12 ` [PATCH net-next 2/4] net: dst_cache: add input_dst_cache API Leone Fernando
@ 2024-03-09  3:55   ` Jakub Kicinski
  2024-03-14 14:04     ` Leone Fernando
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2024-03-09  3:55 UTC (permalink / raw)
  To: Leone Fernando
  Cc: davem, edumazet, pabeni, dsahern, willemb, netdev, linux-kernel

On Thu,  7 Mar 2024 18:12:00 +0100 Leone Fernando wrote:
> +static inline u64 create_dst_cache_key_ip4(const struct sk_buff *skb)
> +{
> +	struct iphdr *iphdr = ip_hdr(skb);
> +
> +	return (((u64)iphdr->daddr) << 8) | iphdr->tos;

Sparse complains that you're ignoring bitwise types:

include/net/dst_cache.h:170:19: warning: cast from restricted __be32
include/net/dst_cache.h:170:19: warning: cast from restricted __be32
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 0/4] net: route: improve route hinting
  2024-03-07 17:11 [PATCH net-next 0/4] net: route: improve route hinting Leone Fernando
                   ` (3 preceding siblings ...)
  2024-03-07 17:12 ` [PATCH net-next 4/4] net: route: replace route hints with input_dst_cache Leone Fernando
@ 2024-03-09  4:53 ` David Ahern
  2024-03-12 15:38   ` Leone Fernando
  2024-04-02 10:08   ` Leone Fernando
  4 siblings, 2 replies; 13+ messages in thread
From: David Ahern @ 2024-03-09  4:53 UTC (permalink / raw)
  To: Leone Fernando, davem, edumazet, kuba, pabeni, willemb
  Cc: netdev, linux-kernel

On 3/7/24 10:11 AM, Leone Fernando wrote:
> In 2017, Paolo Abeni introduced the hinting mechanism [1] to the routing
> sub-system. The hinting optimization improves performance by reusing
> previously found dsts instead of looking them up for each skb.
> 
> This patch series introduces a generalized version of the hinting mechanism that
> can "remember" a larger number of dsts. This reduces the number of dst
> lookups for frequently encountered daddrs.
> 
> Before diving into the code and the benchmarking results, it's important
> to address the deletion of the old route cache [2] and why
> this solution is different. The original cache was complicated,
> vulnerable to DOS attacks and had unstable performance.
> 
> The new input dst_cache is much simpler thanks to its lazy approach,
> improving performance without the overhead of the removed cache
> implementation. Instead of using timers and GC, the deletion of invalid
> entries is performed lazily during their lookups.
> The dsts are stored in a simple, lightweight, static hash table. This
> keeps the lookup times fast yet stable, preventing DOS upon cache misses.
> The new input dst_cache implementation is built over the existing
> dst_cache code which supplies a fast lockless percpu behavior.
> 
> I tested this patch using udp floods with different number of daddrs.
> The benchmarking setup is comprised of 3 machines: a sender,
> a forwarder and a receiver. I measured the PPS received by the receiver
> as the forwarder was running either the mainline kernel or the patched
> kernel, comparing the results. The dst_cache I tested in this benchmark
> used a total of 512 hash table entries, split into buckets of 4
> entries each.
> 
> These are the results:
>   UDP             mainline              patched                   delta
> conns pcpu         Kpps                  Kpps                       %
>    1              274.0255              269.2205                  -1.75
>    2              257.3748              268.0947                   4.17
>   15              241.3513              258.8016                   7.23
>  100              238.3419              258.4939                   8.46
>  500              238.5390              252.6425                   5.91
> 1000              238.7570              242.1820                   1.43
> 2000              238.7780              236.2640                  -1.05
> 4000              239.0440              233.5320                  -2.31
> 8000              239.3248              232.5680                  -2.82
> 

I have looked at all of the sets sent. I can not convince myself this is
a good idea, but at the same time I do not have constructive feedback on
why it is not acceptable. The gains are modest at best.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 0/4] net: route: improve route hinting
  2024-03-09  4:53 ` [PATCH net-next 0/4] net: route: improve route hinting David Ahern
@ 2024-03-12 15:38   ` Leone Fernando
  2024-04-02 10:08   ` Leone Fernando
  1 sibling, 0 replies; 13+ messages in thread
From: Leone Fernando @ 2024-03-12 15:38 UTC (permalink / raw)
  To: David Ahern, davem, edumazet, kuba, pabeni, willemb; +Cc: netdev, linux-kernel

David Ahern wrote:
> I have looked at all of the sets sent. I can not convince myself this is
> a good idea, but at the same time I do not have constructive feedback on
> why it is not acceptable. The gains are modest at best.
> 

Thanks for the comment.

I believe an improvement of 5-8% in PPS is significant. 
Note that the cache is per-cpu (e.g., for a machine with 10 CPUs, the improvement
affects 10X the conns mentioned).

Could you please provide more information about what you don't like
in the patch?

Some possible issues I can think of:
- Do you think the improvement is not affecting the common case?
In this case, it can be solved by tweaking the cache parameters.

- In case performance degradation for some # of conns is problematic - we can
find ways to reduce it. For example, the degradation for 1 connection can
probably be solved by keeping the route hints.

Leone


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 2/4] net: dst_cache: add input_dst_cache API
  2024-03-09  3:55   ` Jakub Kicinski
@ 2024-03-14 14:04     ` Leone Fernando
  2024-03-14 18:20       ` Jakub Kicinski
  0 siblings, 1 reply; 13+ messages in thread
From: Leone Fernando @ 2024-03-14 14:04 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, edumazet, pabeni, dsahern, willemb, netdev, linux-kernel

Thanks Jakub. I'll fix it and submit a v2.
What do you think about the patch in general?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 2/4] net: dst_cache: add input_dst_cache API
  2024-03-14 14:04     ` Leone Fernando
@ 2024-03-14 18:20       ` Jakub Kicinski
  0 siblings, 0 replies; 13+ messages in thread
From: Jakub Kicinski @ 2024-03-14 18:20 UTC (permalink / raw)
  To: Leone Fernando
  Cc: davem, edumazet, pabeni, dsahern, willemb, netdev, linux-kernel

On Thu, 14 Mar 2024 15:04:02 +0100 Leone Fernando wrote:
> Thanks Jakub. I'll fix it and submit a v2.
> What do you think about the patch in general?

Dunno.. it's a bit hard to judge how much benefit we'd get
in real life scenarios. But I'm not a routing expert.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 0/4] net: route: improve route hinting
  2024-03-09  4:53 ` [PATCH net-next 0/4] net: route: improve route hinting David Ahern
  2024-03-12 15:38   ` Leone Fernando
@ 2024-04-02 10:08   ` Leone Fernando
  2024-04-02 15:02     ` David Ahern
  1 sibling, 1 reply; 13+ messages in thread
From: Leone Fernando @ 2024-04-02 10:08 UTC (permalink / raw)
  To: David Ahern, davem, edumazet, kuba, pabeni, willemb; +Cc: netdev, linux-kernel

Hi David,

I plan to continue working on this patch, and it
would be helpful if you could share some more thoughts.
As I said before, I think this patch is significant, and my measurements
showed a consistent improvement in most cases.

Thanks,
Leone

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 0/4] net: route: improve route hinting
  2024-04-02 10:08   ` Leone Fernando
@ 2024-04-02 15:02     ` David Ahern
  2024-04-03 11:50       ` Leone Fernando
  0 siblings, 1 reply; 13+ messages in thread
From: David Ahern @ 2024-04-02 15:02 UTC (permalink / raw)
  To: Leone Fernando, davem, edumazet, kuba, pabeni, willemb
  Cc: netdev, linux-kernel

On 4/2/24 4:08 AM, Leone Fernando wrote:
> Hi David,
> 
> I plan to continue working on this patch, and it
> would be helpful if you could share some more thoughts.
> As I said before, I think this patch is significant, and my measurements
> showed a consistent improvement in most cases.
> 

It seems to me patch 1 and a version of it for IPv6 should go in
independent of this set.

For the rest of it, Jakub's response was a good summary: it is hard to
know if there is a benefit to real workloads. Cache's consume resources
(memory and cpu) and will be wrong some percentage of the time
increasing overhead.

Also, it is targeted at a very narrow use case -- IPv4 only, no custom
FIB rules and no multipath.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 0/4] net: route: improve route hinting
  2024-04-02 15:02     ` David Ahern
@ 2024-04-03 11:50       ` Leone Fernando
  0 siblings, 0 replies; 13+ messages in thread
From: Leone Fernando @ 2024-04-03 11:50 UTC (permalink / raw)
  To: David Ahern, davem, edumazet, kuba, pabeni, willemb; +Cc: netdev, linux-kernel

Hi David,

> 
> For the rest of it, Jakub's response was a good summary: it is hard to
> know if there is a benefit to real workloads. Cache's consume resources
> (memory and cpu) and will be wrong some percentage of the time
> increasing overhead.
> 

I understand what you are saying here.
Do you have an idea for extra tests or measurements to make sure
the patch also improves in real workloads?

> Also, it is targeted at a very narrow use case -- IPv4 only, no custom
> FIB rules and no multipath.

Implementation for IPv6 is almost identical. I decided to start with IPv4 
and I plan to submit IPv6 support in a future patch. 
This patch basically improves the hinting mechanism that Paolo introduced
and it handles the same cases.
Adding support for FIB rules and multipath is a bit more complicated but 
also possible. I am willing to keep working on this patch to meet the 
remaining cases including IPv6.

Thanks,
Leone

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-04-03 11:50 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-07 17:11 [PATCH net-next 0/4] net: route: improve route hinting Leone Fernando
2024-03-07 17:11 ` [PATCH net-next 1/4] net: route: expire rt if the dst it holds is expired Leone Fernando
2024-03-07 17:12 ` [PATCH net-next 2/4] net: dst_cache: add input_dst_cache API Leone Fernando
2024-03-09  3:55   ` Jakub Kicinski
2024-03-14 14:04     ` Leone Fernando
2024-03-14 18:20       ` Jakub Kicinski
2024-03-07 17:12 ` [PATCH net-next 3/4] net: route: always compile dst_cache Leone Fernando
2024-03-07 17:12 ` [PATCH net-next 4/4] net: route: replace route hints with input_dst_cache Leone Fernando
2024-03-09  4:53 ` [PATCH net-next 0/4] net: route: improve route hinting David Ahern
2024-03-12 15:38   ` Leone Fernando
2024-04-02 10:08   ` Leone Fernando
2024-04-02 15:02     ` David Ahern
2024-04-03 11:50       ` Leone Fernando

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).