* [PATCH net-next v2 0/2] net: introduce and use route hint @ 2019-11-18 11:01 Paolo Abeni 2019-11-18 11:01 ` [PATCH net-next v2 1/2] ipv6: introduce and uses route look hints for list input Paolo Abeni 2019-11-18 11:01 ` [PATCH net-next v2 2/2] ipv4: use dst hint for ipv4 list receive Paolo Abeni 0 siblings, 2 replies; 11+ messages in thread From: Paolo Abeni @ 2019-11-18 11:01 UTC (permalink / raw) To: netdev; +Cc: David S. Miller, Willem de Bruijn, Edward Cree This series leverages the listification infrastructure to avoid unnecessary route lookup on ingress packets. In absence of policy routing, packets with equal daddr will usually land on the same dst. When processing packet bursts (lists) we can easily reference the previous dst entry. When we hit the 'same destination' condition we can avoid the route lookup, coping the already available dst. Detailed performance numbers are available in the individual commit messages. v1 -> v2 - fix build issue with !CONFIG_IP*_MULTIPLE_TABLES - fix potential race in ip6_list_rcv_finish() Paolo Abeni (2): ipv6: introduce and uses route look hints for list input ipv4: use dst hint for ipv4 list receive include/net/route.h | 11 +++++++++++ net/ipv4/ip_input.c | 38 +++++++++++++++++++++++++++++++++----- net/ipv4/route.c | 38 ++++++++++++++++++++++++++++++++++++++ net/ipv6/ip6_input.c | 40 ++++++++++++++++++++++++++++++++++++---- 4 files changed, 118 insertions(+), 9 deletions(-) -- 2.21.0 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH net-next v2 1/2] ipv6: introduce and uses route look hints for list input 2019-11-18 11:01 [PATCH net-next v2 0/2] net: introduce and use route hint Paolo Abeni @ 2019-11-18 11:01 ` Paolo Abeni 2019-11-18 20:29 ` Willem de Bruijn 2019-11-18 11:01 ` [PATCH net-next v2 2/2] ipv4: use dst hint for ipv4 list receive Paolo Abeni 1 sibling, 1 reply; 11+ messages in thread From: Paolo Abeni @ 2019-11-18 11:01 UTC (permalink / raw) To: netdev; +Cc: David S. Miller, Willem de Bruijn, Edward Cree When doing RX batch packet processing, we currently always repeat the route lookup for each ingress packet. If policy routing is configured, and IPV6_SUBTREES is disabled at build time, we know that packets with the same destination address will use the same dst. This change tries to avoid per packet route lookup caching the destination address of the latest successful lookup, and reusing it for the next packet when the above conditions are in place. Ingress traffic for most servers should fit. The measured performance delta under UDP flood vs a recvmmsg receiver is as follow: vanilla patched delta Kpps Kpps % 1431 1664 +14 In the worst-case scenario - each packet has a different destination address - the performance delta is within noise range. v1 -> v2: - fix build issue with !CONFIG_IPV6_MULTIPLE_TABLES - fix potential race when fib6_has_custom_rules is set while processing a packet batch Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- net/ipv6/ip6_input.c | 40 ++++++++++++++++++++++++++++++++++++---- 1 file changed, 36 insertions(+), 4 deletions(-) diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c index ef7f707d9ae3..f559ad6b09ef 100644 --- a/net/ipv6/ip6_input.c +++ b/net/ipv6/ip6_input.c @@ -44,10 +44,16 @@ #include <net/inet_ecn.h> #include <net/dst_metadata.h> +struct ip6_route_input_hint { + unsigned long refdst; + struct in6_addr daddr; +}; + INDIRECT_CALLABLE_DECLARE(void udp_v6_early_demux(struct sk_buff *)); INDIRECT_CALLABLE_DECLARE(void tcp_v6_early_demux(struct sk_buff *)); static void ip6_rcv_finish_core(struct net *net, struct sock *sk, - struct sk_buff *skb) + struct sk_buff *skb, + struct ip6_route_input_hint *hint) { void (*edemux)(struct sk_buff *skb); @@ -59,7 +65,13 @@ static void ip6_rcv_finish_core(struct net *net, struct sock *sk, INDIRECT_CALL_2(edemux, tcp_v6_early_demux, udp_v6_early_demux, skb); } - if (!skb_valid_dst(skb)) + + if (skb_valid_dst(skb)) + return; + + if (hint && ipv6_addr_equal(&hint->daddr, &ipv6_hdr(skb)->daddr)) + __skb_dst_copy(skb, hint->refdst); + else ip6_route_input(skb); } @@ -71,7 +83,7 @@ int ip6_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb) skb = l3mdev_ip6_rcv(skb); if (!skb) return NET_RX_SUCCESS; - ip6_rcv_finish_core(net, sk, skb); + ip6_rcv_finish_core(net, sk, skb, NULL); return dst_input(skb); } @@ -86,9 +98,20 @@ static void ip6_sublist_rcv_finish(struct list_head *head) } } +static bool ip6_can_cache_route_hint(struct net *net) +{ + return !IS_ENABLED(IPV6_SUBTREES) && +#ifdef CONFIG_IPV6_MULTIPLE_TABLES + !net->ipv6.fib6_has_custom_rules; +#else + 1; +#endif +} + static void ip6_list_rcv_finish(struct net *net, struct sock *sk, struct list_head *head) { + struct ip6_route_input_hint _hint, *hint = NULL; struct dst_entry *curr_dst = NULL; struct sk_buff *skb, *next; struct list_head sublist; @@ -104,9 +127,18 @@ static void ip6_list_rcv_finish(struct net *net, struct sock *sk, skb = l3mdev_ip6_rcv(skb); if (!skb) continue; - ip6_rcv_finish_core(net, sk, skb); + ip6_rcv_finish_core(net, sk, skb, hint); dst = skb_dst(skb); if (curr_dst != dst) { + if (ip6_can_cache_route_hint(net)) { + _hint.refdst = skb->_skb_refdst; + memcpy(&_hint.daddr, &ipv6_hdr(skb)->daddr, + sizeof(_hint.daddr)); + hint = &_hint; + } else { + hint = NULL; + } + /* dispatch old sublist */ if (!list_empty(&sublist)) ip6_sublist_rcv_finish(&sublist); -- 2.21.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH net-next v2 1/2] ipv6: introduce and uses route look hints for list input 2019-11-18 11:01 ` [PATCH net-next v2 1/2] ipv6: introduce and uses route look hints for list input Paolo Abeni @ 2019-11-18 20:29 ` Willem de Bruijn 2019-11-18 21:58 ` Paolo Abeni 0 siblings, 1 reply; 11+ messages in thread From: Willem de Bruijn @ 2019-11-18 20:29 UTC (permalink / raw) To: Paolo Abeni Cc: Network Development, David S. Miller, Willem de Bruijn, Edward Cree On Mon, Nov 18, 2019 at 6:03 AM Paolo Abeni <pabeni@redhat.com> wrote: > > When doing RX batch packet processing, we currently always repeat > the route lookup for each ingress packet. If policy routing is > configured, and IPV6_SUBTREES is disabled at build time, we > know that packets with the same destination address will use > the same dst. > > This change tries to avoid per packet route lookup caching > the destination address of the latest successful lookup, and > reusing it for the next packet when the above conditions are > in place. Ingress traffic for most servers should fit. > > The measured performance delta under UDP flood vs a recvmmsg > receiver is as follow: > > vanilla patched delta > Kpps Kpps % > 1431 1664 +14 Since IPv4 speed-up is almost half and code considerably more complex, maybe only do IPv6? > > In the worst-case scenario - each packet has a different > destination address - the performance delta is within noise > range. > > v1 -> v2: > - fix build issue with !CONFIG_IPV6_MULTIPLE_TABLES > - fix potential race when fib6_has_custom_rules is set > while processing a packet batch > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > --- > net/ipv6/ip6_input.c | 40 ++++++++++++++++++++++++++++++++++++---- > 1 file changed, 36 insertions(+), 4 deletions(-) > > diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c > index ef7f707d9ae3..f559ad6b09ef 100644 > --- a/net/ipv6/ip6_input.c > +++ b/net/ipv6/ip6_input.c > @@ -44,10 +44,16 @@ > #include <net/inet_ecn.h> > #include <net/dst_metadata.h> > > +struct ip6_route_input_hint { > + unsigned long refdst; > + struct in6_addr daddr; > +}; > + > INDIRECT_CALLABLE_DECLARE(void udp_v6_early_demux(struct sk_buff *)); > INDIRECT_CALLABLE_DECLARE(void tcp_v6_early_demux(struct sk_buff *)); > static void ip6_rcv_finish_core(struct net *net, struct sock *sk, > - struct sk_buff *skb) > + struct sk_buff *skb, > + struct ip6_route_input_hint *hint) > { > void (*edemux)(struct sk_buff *skb); > > @@ -59,7 +65,13 @@ static void ip6_rcv_finish_core(struct net *net, struct sock *sk, > INDIRECT_CALL_2(edemux, tcp_v6_early_demux, > udp_v6_early_demux, skb); > } > - if (!skb_valid_dst(skb)) > + > + if (skb_valid_dst(skb)) > + return; > + > + if (hint && ipv6_addr_equal(&hint->daddr, &ipv6_hdr(skb)->daddr)) > + __skb_dst_copy(skb, hint->refdst); > + else > ip6_route_input(skb); Is it possible to do the address comparison in ip6_list_rcv_finish itself and pass a pointer to refdst if safe? To avoid new struct definition, memcpy and to have all logic in one place. Need to keep a pointer to the prev skb, then, instead. > static void ip6_list_rcv_finish(struct net *net, struct sock *sk, > struct list_head *head) > { > + struct ip6_route_input_hint _hint, *hint = NULL; > struct dst_entry *curr_dst = NULL; > struct sk_buff *skb, *next; > struct list_head sublist; > @@ -104,9 +127,18 @@ static void ip6_list_rcv_finish(struct net *net, struct sock *sk, > skb = l3mdev_ip6_rcv(skb); > if (!skb) > continue; > - ip6_rcv_finish_core(net, sk, skb); > + ip6_rcv_finish_core(net, sk, skb, hint); > dst = skb_dst(skb); > if (curr_dst != dst) { > + if (ip6_can_cache_route_hint(net)) { > + _hint.refdst = skb->_skb_refdst; > + memcpy(&_hint.daddr, &ipv6_hdr(skb)->daddr, > + sizeof(_hint.daddr)); > + hint = &_hint; > + } else { > + hint = NULL; > + } not needed. ip6_can_cache_route_hit is the same for all iterations of the loop (indeed, compile time static), so if false, hint is never set. > + > /* dispatch old sublist */ > if (!list_empty(&sublist)) > ip6_sublist_rcv_finish(&sublist); > -- > 2.21.0 > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next v2 1/2] ipv6: introduce and uses route look hints for list input 2019-11-18 20:29 ` Willem de Bruijn @ 2019-11-18 21:58 ` Paolo Abeni 2019-11-19 14:10 ` Willem de Bruijn 0 siblings, 1 reply; 11+ messages in thread From: Paolo Abeni @ 2019-11-18 21:58 UTC (permalink / raw) To: Willem de Bruijn; +Cc: Network Development, David S. Miller, Edward Cree On Mon, 2019-11-18 at 15:29 -0500, Willem de Bruijn wrote: > On Mon, Nov 18, 2019 at 6:03 AM Paolo Abeni <pabeni@redhat.com> wrote: > > When doing RX batch packet processing, we currently always repeat > > the route lookup for each ingress packet. If policy routing is > > configured, and IPV6_SUBTREES is disabled at build time, we > > know that packets with the same destination address will use > > the same dst. > > > > This change tries to avoid per packet route lookup caching > > the destination address of the latest successful lookup, and > > reusing it for the next packet when the above conditions are > > in place. Ingress traffic for most servers should fit. > > > > The measured performance delta under UDP flood vs a recvmmsg > > receiver is as follow: > > > > vanilla patched delta > > Kpps Kpps % > > 1431 1664 +14 > > Since IPv4 speed-up is almost half and code considerably more complex, > maybe only do IPv6? uhmm... I would avoid that kind of assimmetry, and I would not look down on a 8% speedup, if possible. > > In the worst-case scenario - each packet has a different > > destination address - the performance delta is within noise > > range. > > > > v1 -> v2: > > - fix build issue with !CONFIG_IPV6_MULTIPLE_TABLES > > - fix potential race when fib6_has_custom_rules is set > > while processing a packet batch > > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > > --- > > net/ipv6/ip6_input.c | 40 ++++++++++++++++++++++++++++++++++++---- > > 1 file changed, 36 insertions(+), 4 deletions(-) > > > > diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c > > index ef7f707d9ae3..f559ad6b09ef 100644 > > --- a/net/ipv6/ip6_input.c > > +++ b/net/ipv6/ip6_input.c > > @@ -44,10 +44,16 @@ > > #include <net/inet_ecn.h> > > #include <net/dst_metadata.h> > > > > +struct ip6_route_input_hint { > > + unsigned long refdst; > > + struct in6_addr daddr; > > +}; > > + > > INDIRECT_CALLABLE_DECLARE(void udp_v6_early_demux(struct sk_buff *)); > > INDIRECT_CALLABLE_DECLARE(void tcp_v6_early_demux(struct sk_buff *)); > > static void ip6_rcv_finish_core(struct net *net, struct sock *sk, > > - struct sk_buff *skb) > > + struct sk_buff *skb, > > + struct ip6_route_input_hint *hint) > > { > > void (*edemux)(struct sk_buff *skb); > > > > @@ -59,7 +65,13 @@ static void ip6_rcv_finish_core(struct net *net, struct sock *sk, > > INDIRECT_CALL_2(edemux, tcp_v6_early_demux, > > udp_v6_early_demux, skb); > > } > > - if (!skb_valid_dst(skb)) > > + > > + if (skb_valid_dst(skb)) > > + return; > > + > > + if (hint && ipv6_addr_equal(&hint->daddr, &ipv6_hdr(skb)->daddr)) > > + __skb_dst_copy(skb, hint->refdst); > > + else > > ip6_route_input(skb); > > Is it possible to do the address comparison in ip6_list_rcv_finish > itself and pass a pointer to refdst if safe? To avoid new struct > definition, memcpy and to have all logic in one place. Need to > keep a pointer to the prev skb, then, instead. I haven't tought about that. Sounds promising. I'll try, thanks. > > static void ip6_list_rcv_finish(struct net *net, struct sock *sk, > > struct list_head *head) > > { > > + struct ip6_route_input_hint _hint, *hint = NULL; > > struct dst_entry *curr_dst = NULL; > > struct sk_buff *skb, *next; > > struct list_head sublist; > > @@ -104,9 +127,18 @@ static void ip6_list_rcv_finish(struct net *net, struct sock *sk, > > skb = l3mdev_ip6_rcv(skb); > > if (!skb) > > continue; > > - ip6_rcv_finish_core(net, sk, skb); > > + ip6_rcv_finish_core(net, sk, skb, hint); > > dst = skb_dst(skb); > > if (curr_dst != dst) { > > + if (ip6_can_cache_route_hint(net)) { > > + _hint.refdst = skb->_skb_refdst; > > + memcpy(&_hint.daddr, &ipv6_hdr(skb)->daddr, > > + sizeof(_hint.daddr)); > > + hint = &_hint; > > + } else { > > + hint = NULL; > > + } > > not needed. ip6_can_cache_route_hit is the same for all iterations of > the loop (indeed, compile time static), so if false, hint is never > set. I think this is needed, instead: if CONFIG_MULTIPLE_TABLES=y, fib6_has_custom_rules can change at runtime - from 'false' to 'true'. If we don't reset 'hint', we could end-up with use-after-free. Cheers, Paolo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next v2 1/2] ipv6: introduce and uses route look hints for list input 2019-11-18 21:58 ` Paolo Abeni @ 2019-11-19 14:10 ` Willem de Bruijn 0 siblings, 0 replies; 11+ messages in thread From: Willem de Bruijn @ 2019-11-19 14:10 UTC (permalink / raw) To: Paolo Abeni Cc: Willem de Bruijn, Network Development, David S. Miller, Edward Cree On Mon, Nov 18, 2019 at 4:59 PM Paolo Abeni <pabeni@redhat.com> wrote: > > On Mon, 2019-11-18 at 15:29 -0500, Willem de Bruijn wrote: > > On Mon, Nov 18, 2019 at 6:03 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > When doing RX batch packet processing, we currently always repeat > > > the route lookup for each ingress packet. If policy routing is > > > configured, and IPV6_SUBTREES is disabled at build time, we > > > know that packets with the same destination address will use > > > the same dst. > > > > > > This change tries to avoid per packet route lookup caching > > > the destination address of the latest successful lookup, and > > > reusing it for the next packet when the above conditions are > > > in place. Ingress traffic for most servers should fit. > > > > > > The measured performance delta under UDP flood vs a recvmmsg > > > receiver is as follow: > > > > > > vanilla patched delta > > > Kpps Kpps % > > > 1431 1664 +14 > > > > Since IPv4 speed-up is almost half and code considerably more complex, > > maybe only do IPv6? > > uhmm... I would avoid that kind of assimmetry, and I would not look > down on a 8% speedup, if possible. Okay, that's fair. > > > @@ -104,9 +127,18 @@ static void ip6_list_rcv_finish(struct net *net, struct sock *sk, > > > skb = l3mdev_ip6_rcv(skb); > > > if (!skb) > > > continue; > > > - ip6_rcv_finish_core(net, sk, skb); > > > + ip6_rcv_finish_core(net, sk, skb, hint); > > > dst = skb_dst(skb); > > > if (curr_dst != dst) { > > > + if (ip6_can_cache_route_hint(net)) { > > > + _hint.refdst = skb->_skb_refdst; > > > + memcpy(&_hint.daddr, &ipv6_hdr(skb)->daddr, > > > + sizeof(_hint.daddr)); > > > + hint = &_hint; > > > + } else { > > > + hint = NULL; > > > + } > > > > not needed. ip6_can_cache_route_hit is the same for all iterations of > > the loop (indeed, compile time static), so if false, hint is never > > set. > > I think this is needed, instead: if CONFIG_MULTIPLE_TABLES=y, > fib6_has_custom_rules can change at runtime - from 'false' to 'true'. > If we don't reset 'hint', we could end-up with use-after-free. Uhm, of course, this is not compile time static at all. I clearly missed a part. But such a config change does not expect instantaneous effect on packets in flight, like those in the recv rcu critical section? In which case it should be safe to treat all skbs in the list the same. I would need to read that code more closely to be certain, and the current solution errs on the side of caution, so is definitely fine as is, of course. ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH net-next v2 2/2] ipv4: use dst hint for ipv4 list receive 2019-11-18 11:01 [PATCH net-next v2 0/2] net: introduce and use route hint Paolo Abeni 2019-11-18 11:01 ` [PATCH net-next v2 1/2] ipv6: introduce and uses route look hints for list input Paolo Abeni @ 2019-11-18 11:01 ` Paolo Abeni 2019-11-18 14:11 ` kbuild test robot 2019-11-18 16:07 ` David Ahern 1 sibling, 2 replies; 11+ messages in thread From: Paolo Abeni @ 2019-11-18 11:01 UTC (permalink / raw) To: netdev; +Cc: David S. Miller, Willem de Bruijn, Edward Cree This is alike the previous change, with some additional ipv4 specific quirk. Even when using the route hint we still have to do perform additional per packet checks about source address validity: a new helper is added to wrap them. Moreover, the ipv4 route lookup, even in the absence of policy routing, may depend on pkts ToS, so we cache that values, too. Explicitly avoid hints for local broadcast: this simplify the code and broadcasts are slower path anyway. UDP flood performances vs recvmmsg() receiver: vanilla patched delta Kpps Kpps % 1683 1833 +8 In the worst case scenario - each packet has a different destination address - the performance delta is within noise range. v1 -> v2: - fix build issue with !CONFIG_IP_MULTIPLE_TABLES Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- include/net/route.h | 11 +++++++++++ net/ipv4/ip_input.c | 38 +++++++++++++++++++++++++++++++++----- net/ipv4/route.c | 38 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 82 insertions(+), 5 deletions(-) diff --git a/include/net/route.h b/include/net/route.h index 6c516840380d..f7a8a52318cd 100644 --- a/include/net/route.h +++ b/include/net/route.h @@ -185,6 +185,17 @@ int ip_route_input_rcu(struct sk_buff *skb, __be32 dst, __be32 src, u8 tos, struct net_device *devin, struct fib_result *res); +struct ip_route_input_hint { + unsigned long refdst; + __be32 daddr; + char tos; + bool local; +}; + +int ip_route_use_hint(struct sk_buff *skb, __be32 dst, __be32 src, + u8 tos, struct net_device *devin, + struct ip_route_input_hint *hint); + static inline int ip_route_input(struct sk_buff *skb, __be32 dst, __be32 src, u8 tos, struct net_device *devin) { diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c index 24a95126e698..25f6fcc65380 100644 --- a/net/ipv4/ip_input.c +++ b/net/ipv4/ip_input.c @@ -305,7 +305,8 @@ static inline bool ip_rcv_options(struct sk_buff *skb, struct net_device *dev) INDIRECT_CALLABLE_DECLARE(int udp_v4_early_demux(struct sk_buff *)); INDIRECT_CALLABLE_DECLARE(int tcp_v4_early_demux(struct sk_buff *)); static int ip_rcv_finish_core(struct net *net, struct sock *sk, - struct sk_buff *skb, struct net_device *dev) + struct sk_buff *skb, struct net_device *dev, + struct ip_route_input_hint *hint) { const struct iphdr *iph = ip_hdr(skb); int (*edemux)(struct sk_buff *skb); @@ -335,8 +336,12 @@ static int ip_rcv_finish_core(struct net *net, struct sock *sk, * how the packet travels inside Linux networking. */ if (!skb_valid_dst(skb)) { - err = ip_route_input_noref(skb, iph->daddr, iph->saddr, - iph->tos, dev); + if (hint && hint->daddr == iph->daddr && hint->tos == iph->tos) + err = ip_route_use_hint(skb, iph->daddr, iph->saddr, + iph->tos, dev, hint); + else + err = ip_route_input_noref(skb, iph->daddr, iph->saddr, + iph->tos, dev); if (unlikely(err)) goto drop_error; } @@ -408,7 +413,7 @@ static int ip_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb) if (!skb) return NET_RX_SUCCESS; - ret = ip_rcv_finish_core(net, sk, skb, dev); + ret = ip_rcv_finish_core(net, sk, skb, dev, NULL); if (ret != NET_RX_DROP) ret = dst_input(skb); return ret; @@ -535,9 +540,20 @@ static void ip_sublist_rcv_finish(struct list_head *head) } } +static bool ip_can_cache_route_hint(struct net *net, struct rtable *rt) +{ + return rt->rt_type != RTN_BROADCAST && +#ifdef CONFIG_IP_MULTIPLE_TABLES + !net->ipv6.fib6_has_custom_rules; +#else + 1; +#endif +} + static void ip_list_rcv_finish(struct net *net, struct sock *sk, struct list_head *head) { + struct ip_route_input_hint _hint, *hint = NULL; struct dst_entry *curr_dst = NULL; struct sk_buff *skb, *next; struct list_head sublist; @@ -554,11 +570,23 @@ static void ip_list_rcv_finish(struct net *net, struct sock *sk, skb = l3mdev_ip_rcv(skb); if (!skb) continue; - if (ip_rcv_finish_core(net, sk, skb, dev) == NET_RX_DROP) + if (ip_rcv_finish_core(net, sk, skb, dev, hint) == NET_RX_DROP) continue; dst = skb_dst(skb); if (curr_dst != dst) { + struct rtable *rt = (struct rtable *)dst; + + if (ip_can_cache_route_hint(net, rt)) { + _hint.refdst = skb->_skb_refdst; + _hint.daddr = ip_hdr(skb)->daddr; + _hint.tos = ip_hdr(skb)->tos; + _hint.local = rt->rt_type == RTN_LOCAL; + hint = &_hint; + } else { + hint = NULL; + } + /* dispatch old sublist */ if (!list_empty(&sublist)) ip_sublist_rcv_finish(&sublist); diff --git a/net/ipv4/route.c b/net/ipv4/route.c index dcc4fa10138d..b0ddff17db80 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2019,6 +2019,44 @@ static int ip_mkroute_input(struct sk_buff *skb, return __mkroute_input(skb, res, in_dev, daddr, saddr, tos); } +/* Implements all the saddr-related checks as ip_route_input_slow(), + * assuming daddr is valid and this is not a local broadcast. + * Uses the provided hint instead of performing a route lookup. + */ +int ip_route_use_hint(struct sk_buff *skb, __be32 daddr, __be32 saddr, + u8 tos, struct net_device *dev, + struct ip_route_input_hint *hint) +{ + struct in_device *in_dev = __in_dev_get_rcu(dev); + struct net *net = dev_net(dev); + int err = -EINVAL; + u32 itag = 0; + + if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr)) + goto martian_source; + + if (ipv4_is_zeronet(saddr)) + goto martian_source; + + if (ipv4_is_loopback(saddr) && !IN_DEV_NET_ROUTE_LOCALNET(in_dev, net)) + goto martian_source; + + if (hint->local) { + err = fib_validate_source(skb, saddr, daddr, tos, 0, dev, + in_dev, &itag); + if (err < 0) + goto martian_source; + } + + err = 0; + __skb_dst_copy(skb, hint->refdst); + return err; + +martian_source: + ip_handle_martian_source(dev, in_dev, skb, daddr, saddr); + return err; +} + /* * NOTE. We drop all the packets that has local source * addresses, because every properly looped back packet -- 2.21.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH net-next v2 2/2] ipv4: use dst hint for ipv4 list receive 2019-11-18 11:01 ` [PATCH net-next v2 2/2] ipv4: use dst hint for ipv4 list receive Paolo Abeni @ 2019-11-18 14:11 ` kbuild test robot 2019-11-18 16:07 ` David Ahern 1 sibling, 0 replies; 11+ messages in thread From: kbuild test robot @ 2019-11-18 14:11 UTC (permalink / raw) To: Paolo Abeni Cc: kbuild-all, netdev, David S. Miller, Willem de Bruijn, Edward Cree [-- Attachment #1: Type: text/plain, Size: 1940 bytes --] Hi Paolo, Thank you for the patch! Yet something to improve: [auto build test ERROR on net-next/master] [also build test ERROR on v5.4-rc8 next-20191115] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/Paolo-Abeni/net-introduce-and-use-route-hint/20191118-195936 base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 19b7e21c55c81713c4011278143006af9f232504 config: mips-malta_kvm_defconfig (attached as .config) compiler: mipsel-linux-gcc (GCC) 7.4.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=7.4.0 make.cross ARCH=mips If you fix the issue, kindly add following tag Reported-by: kbuild test robot <lkp@intel.com> All errors (new ones prefixed by >>): net//ipv4/ip_input.c: In function 'ip_can_cache_route_hint': >> net//ipv4/ip_input.c:547:19: error: 'struct netns_ipv6' has no member named 'fib6_has_custom_rules' !net->ipv6.fib6_has_custom_rules; ^ net//ipv4/ip_input.c:551:1: warning: control reaches end of non-void function [-Wreturn-type] } ^ vim +547 net//ipv4/ip_input.c 542 543 static bool ip_can_cache_route_hint(struct net *net, struct rtable *rt) 544 { 545 return rt->rt_type != RTN_BROADCAST && 546 #ifdef CONFIG_IP_MULTIPLE_TABLES > 547 !net->ipv6.fib6_has_custom_rules; 548 #else 549 1; 550 #endif 551 } 552 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 20609 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next v2 2/2] ipv4: use dst hint for ipv4 list receive @ 2019-11-18 14:11 ` kbuild test robot 0 siblings, 0 replies; 11+ messages in thread From: kbuild test robot @ 2019-11-18 14:11 UTC (permalink / raw) To: kbuild-all [-- Attachment #1: Type: text/plain, Size: 1992 bytes --] Hi Paolo, Thank you for the patch! Yet something to improve: [auto build test ERROR on net-next/master] [also build test ERROR on v5.4-rc8 next-20191115] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/Paolo-Abeni/net-introduce-and-use-route-hint/20191118-195936 base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 19b7e21c55c81713c4011278143006af9f232504 config: mips-malta_kvm_defconfig (attached as .config) compiler: mipsel-linux-gcc (GCC) 7.4.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=7.4.0 make.cross ARCH=mips If you fix the issue, kindly add following tag Reported-by: kbuild test robot <lkp@intel.com> All errors (new ones prefixed by >>): net//ipv4/ip_input.c: In function 'ip_can_cache_route_hint': >> net//ipv4/ip_input.c:547:19: error: 'struct netns_ipv6' has no member named 'fib6_has_custom_rules' !net->ipv6.fib6_has_custom_rules; ^ net//ipv4/ip_input.c:551:1: warning: control reaches end of non-void function [-Wreturn-type] } ^ vim +547 net//ipv4/ip_input.c 542 543 static bool ip_can_cache_route_hint(struct net *net, struct rtable *rt) 544 { 545 return rt->rt_type != RTN_BROADCAST && 546 #ifdef CONFIG_IP_MULTIPLE_TABLES > 547 !net->ipv6.fib6_has_custom_rules; 548 #else 549 1; 550 #endif 551 } 552 --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org Intel Corporation [-- Attachment #2: config.gz --] [-- Type: application/gzip, Size: 20609 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next v2 2/2] ipv4: use dst hint for ipv4 list receive 2019-11-18 11:01 ` [PATCH net-next v2 2/2] ipv4: use dst hint for ipv4 list receive Paolo Abeni 2019-11-18 14:11 ` kbuild test robot @ 2019-11-18 16:07 ` David Ahern 2019-11-18 16:31 ` Paolo Abeni 1 sibling, 1 reply; 11+ messages in thread From: David Ahern @ 2019-11-18 16:07 UTC (permalink / raw) To: Paolo Abeni, netdev; +Cc: David S. Miller, Willem de Bruijn, Edward Cree On 11/18/19 4:01 AM, Paolo Abeni wrote: > @@ -535,9 +540,20 @@ static void ip_sublist_rcv_finish(struct list_head *head) > } > } > > +static bool ip_can_cache_route_hint(struct net *net, struct rtable *rt) > +{ > + return rt->rt_type != RTN_BROADCAST && > +#ifdef CONFIG_IP_MULTIPLE_TABLES > + !net->ipv6.fib6_has_custom_rules; that should be ipv4, not ipv6, right? Also, for readability it would be better to have 2 helpers in include//net/fib_rules.h that return true false and manage the net namespace issue. > +#else > + 1; > +#endif > +} > + ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next v2 2/2] ipv4: use dst hint for ipv4 list receive 2019-11-18 16:07 ` David Ahern @ 2019-11-18 16:31 ` Paolo Abeni 2019-11-18 16:40 ` David Ahern 0 siblings, 1 reply; 11+ messages in thread From: Paolo Abeni @ 2019-11-18 16:31 UTC (permalink / raw) To: David Ahern, netdev; +Cc: David S. Miller, Willem de Bruijn, Edward Cree Hi, Thank you for the feedback. On Mon, 2019-11-18 at 09:07 -0700, David Ahern wrote: > On 11/18/19 4:01 AM, Paolo Abeni wrote: > > @@ -535,9 +540,20 @@ static void ip_sublist_rcv_finish(struct list_head *head) > > } > > } > > > > +static bool ip_can_cache_route_hint(struct net *net, struct rtable *rt) > > +{ > > + return rt->rt_type != RTN_BROADCAST && > > +#ifdef CONFIG_IP_MULTIPLE_TABLES > > + !net->ipv6.fib6_has_custom_rules; > > that should be ipv4, not ipv6, right? Indeed. More coffee needed here, sorry. > Also, for readability it would be better to have 2 helpers in > include//net/fib_rules.h that return true false and manage the net > namespace issue. Double checking I parsed the above correctly. Do you mean something like the following - I think net/ip_fib.h fits more, as it already deals with CONFIG_IP_MULTIPLE_TABLES? --- diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index 52b2406a5dfc..b6c5cd544402 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -272,6 +272,11 @@ void fib_free_table(struct fib_table *tb); #define TABLE_LOCAL_INDEX (RT_TABLE_LOCAL & (FIB_TABLE_HASHSZ - 1)) #define TABLE_MAIN_INDEX (RT_TABLE_MAIN & (FIB_TABLE_HASHSZ - 1)) +static bool fib4_has_custom_rules(struct net *net) +{ + return 0; +} + static inline struct fib_table *fib_get_table(struct net *net, u32 id) { struct hlist_node *tb_hlist; @@ -341,6 +346,11 @@ void __net_exit fib4_rules_exit(struct net *net); struct fib_table *fib_new_table(struct net *net, u32 id); struct fib_table *fib_get_table(struct net *net, u32 id); +static bool fib4_has_custom_rules(struct net *net) +{ + return net->ipv4.fib_has_custom_rules; +} + int __fib_lookup(struct net *net, struct flowi4 *flp, struct fib_result *res, unsigned int flags); --- plus something similar for the previous patch, in include/net/ip6_fib.h Thank you, Paolo ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH net-next v2 2/2] ipv4: use dst hint for ipv4 list receive 2019-11-18 16:31 ` Paolo Abeni @ 2019-11-18 16:40 ` David Ahern 0 siblings, 0 replies; 11+ messages in thread From: David Ahern @ 2019-11-18 16:40 UTC (permalink / raw) To: Paolo Abeni, netdev; +Cc: David S. Miller, Willem de Bruijn, Edward Cree On 11/18/19 9:31 AM, Paolo Abeni wrote: >> Also, for readability it would be better to have 2 helpers in >> include//net/fib_rules.h that return true false and manage the net >> namespace issue. > > Double checking I parsed the above correctly. Do you mean something > like the following - I think net/ip_fib.h fits more, as it already > deals with CONFIG_IP_MULTIPLE_TABLES? sure. And it looks like they already exist in net//ipv4/fib_frontend.c, so those can be moved to ip_fib.h ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2019-11-19 14:11 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-11-18 11:01 [PATCH net-next v2 0/2] net: introduce and use route hint Paolo Abeni 2019-11-18 11:01 ` [PATCH net-next v2 1/2] ipv6: introduce and uses route look hints for list input Paolo Abeni 2019-11-18 20:29 ` Willem de Bruijn 2019-11-18 21:58 ` Paolo Abeni 2019-11-19 14:10 ` Willem de Bruijn 2019-11-18 11:01 ` [PATCH net-next v2 2/2] ipv4: use dst hint for ipv4 list receive Paolo Abeni 2019-11-18 14:11 ` kbuild test robot 2019-11-18 14:11 ` kbuild test robot 2019-11-18 16:07 ` David Ahern 2019-11-18 16:31 ` Paolo Abeni 2019-11-18 16:40 ` David Ahern
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.