* [Patch net] ipv4: restore rt->fi for reference counting @ 2017-05-04 21:54 Cong Wang 2017-05-07 18:53 ` Eric Dumazet 2017-05-08 18:35 ` David Miller 0 siblings, 2 replies; 43+ messages in thread From: Cong Wang @ 2017-05-04 21:54 UTC (permalink / raw) To: netdev; +Cc: andreyknvl, edumazet, Cong Wang IPv4 dst could use fi->fib_metrics to store metrics but fib_info itself is refcnt'ed, so without taking a refcnt fi and fi->fib_metrics could be freed while dst metrics still points to it. This triggers use-after-free as reported by Andrey twice. This patch reverts commit 2860583fe840 ("ipv4: Kill rt->fi") to restore this reference counting. It is a quick fix for -net and -stable, for -net-next, as Eric suggested, we can consider doing reference counting for metrics itself instead of relying on fib_info. IPv6 is very different, it copies or steals the metrics from mx6_config in fib6_commit_metrics() so probably doesn't need a refcnt. Decnet has already done the refcnt'ing, see dn_fib_semantic_match(). Fixes: 2860583fe840 ("ipv4: Kill rt->fi") Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> --- include/net/route.h | 1 + net/ipv4/route.c | 18 +++++++++++++++++- 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/include/net/route.h b/include/net/route.h index 2cc0e14..4335eb7 100644 --- a/include/net/route.h +++ b/include/net/route.h @@ -69,6 +69,7 @@ struct rtable { struct list_head rt_uncached; struct uncached_list *rt_uncached_list; + struct fib_info *fi; /* for refcnt to shared metrics */ }; static inline bool rt_is_input_route(const struct rtable *rt) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 655d9ee..f647310 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1387,6 +1387,11 @@ static void ipv4_dst_destroy(struct dst_entry *dst) { struct rtable *rt = (struct rtable *) dst; + if (rt->fi) { + fib_info_put(rt->fi); + rt->fi = NULL; + } + if (!list_empty(&rt->rt_uncached)) { struct uncached_list *ul = rt->rt_uncached_list; @@ -1424,6 +1429,16 @@ static bool rt_cache_valid(const struct rtable *rt) !rt_is_expired(rt); } +static void rt_init_metrics(struct rtable *rt, struct fib_info *fi) +{ + if (fi->fib_metrics != (u32 *)dst_default_metrics) { + fib_info_hold(fi); + rt->fi = fi; + } + + dst_init_metrics(&rt->dst, fi->fib_metrics, true); +} + static void rt_set_nexthop(struct rtable *rt, __be32 daddr, const struct fib_result *res, struct fib_nh_exception *fnhe, @@ -1438,7 +1453,7 @@ static void rt_set_nexthop(struct rtable *rt, __be32 daddr, rt->rt_gateway = nh->nh_gw; rt->rt_uses_gateway = 1; } - dst_init_metrics(&rt->dst, fi->fib_metrics, true); + rt_init_metrics(rt, fi); #ifdef CONFIG_IP_ROUTE_CLASSID rt->dst.tclassid = nh->nh_tclassid; #endif @@ -1490,6 +1505,7 @@ struct rtable *rt_dst_alloc(struct net_device *dev, rt->rt_gateway = 0; rt->rt_uses_gateway = 0; rt->rt_table_id = 0; + rt->fi = NULL; INIT_LIST_HEAD(&rt->rt_uncached); rt->dst.output = ip_output; -- 2.5.5 ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-04 21:54 [Patch net] ipv4: restore rt->fi for reference counting Cong Wang @ 2017-05-07 18:53 ` Eric Dumazet 2017-05-08 18:35 ` David Miller 1 sibling, 0 replies; 43+ messages in thread From: Eric Dumazet @ 2017-05-07 18:53 UTC (permalink / raw) To: Cong Wang; +Cc: netdev, andreyknvl, edumazet On Thu, 2017-05-04 at 14:54 -0700, Cong Wang wrote: > IPv4 dst could use fi->fib_metrics to store metrics but fib_info > itself is refcnt'ed, so without taking a refcnt fi and > fi->fib_metrics could be freed while dst metrics still points to > it. This triggers use-after-free as reported by Andrey twice. > > This patch reverts commit 2860583fe840 ("ipv4: Kill rt->fi") to > restore this reference counting. It is a quick fix for -net and > -stable, for -net-next, as Eric suggested, we can consider doing > reference counting for metrics itself instead of relying on fib_info. > > IPv6 is very different, it copies or steals the metrics from mx6_config > in fib6_commit_metrics() so probably doesn't need a refcnt. > > Decnet has already done the refcnt'ing, see dn_fib_semantic_match(). > > Fixes: 2860583fe840 ("ipv4: Kill rt->fi") > Reported-by: Andrey Konovalov <andreyknvl@google.com> > Tested-by: Andrey Konovalov <andreyknvl@google.com> > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Thanks ! ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-04 21:54 [Patch net] ipv4: restore rt->fi for reference counting Cong Wang 2017-05-07 18:53 ` Eric Dumazet @ 2017-05-08 18:35 ` David Miller 2017-05-09 0:01 ` Eric Dumazet 1 sibling, 1 reply; 43+ messages in thread From: David Miller @ 2017-05-08 18:35 UTC (permalink / raw) To: xiyou.wangcong; +Cc: netdev, andreyknvl, edumazet From: Cong Wang <xiyou.wangcong@gmail.com> Date: Thu, 4 May 2017 14:54:17 -0700 > IPv4 dst could use fi->fib_metrics to store metrics but fib_info > itself is refcnt'ed, so without taking a refcnt fi and > fi->fib_metrics could be freed while dst metrics still points to > it. This triggers use-after-free as reported by Andrey twice. > > This patch reverts commit 2860583fe840 ("ipv4: Kill rt->fi") to > restore this reference counting. It is a quick fix for -net and > -stable, for -net-next, as Eric suggested, we can consider doing > reference counting for metrics itself instead of relying on fib_info. > > IPv6 is very different, it copies or steals the metrics from mx6_config > in fib6_commit_metrics() so probably doesn't need a refcnt. > > Decnet has already done the refcnt'ing, see dn_fib_semantic_match(). > > Fixes: 2860583fe840 ("ipv4: Kill rt->fi") > Reported-by: Andrey Konovalov <andreyknvl@google.com> > Tested-by: Andrey Konovalov <andreyknvl@google.com> > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Applied and queued up for -stable, thanks. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-08 18:35 ` David Miller @ 2017-05-09 0:01 ` Eric Dumazet 2017-05-09 1:22 ` David Miller 0 siblings, 1 reply; 43+ messages in thread From: Eric Dumazet @ 2017-05-09 0:01 UTC (permalink / raw) To: David Miller; +Cc: xiyou.wangcong, netdev, andreyknvl, edumazet On Mon, 2017-05-08 at 14:35 -0400, David Miller wrote: > From: Cong Wang <xiyou.wangcong@gmail.com> > Date: Thu, 4 May 2017 14:54:17 -0700 > > > IPv4 dst could use fi->fib_metrics to store metrics but fib_info > > itself is refcnt'ed, so without taking a refcnt fi and > > fi->fib_metrics could be freed while dst metrics still points to > > it. This triggers use-after-free as reported by Andrey twice. > > > > This patch reverts commit 2860583fe840 ("ipv4: Kill rt->fi") to > > restore this reference counting. It is a quick fix for -net and > > -stable, for -net-next, as Eric suggested, we can consider doing > > reference counting for metrics itself instead of relying on fib_info. > > > > IPv6 is very different, it copies or steals the metrics from mx6_config > > in fib6_commit_metrics() so probably doesn't need a refcnt. > > > > Decnet has already done the refcnt'ing, see dn_fib_semantic_match(). > > > > Fixes: 2860583fe840 ("ipv4: Kill rt->fi") > > Reported-by: Andrey Konovalov <andreyknvl@google.com> > > Tested-by: Andrey Konovalov <andreyknvl@google.com> > > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> > > Applied and queued up for -stable, thanks. Although I now have on latest net tree these messages when I reboot my test machine. [ 224.085873] unregister_netdevice: waiting for eth0 to become free. Usage count = 43 [ 234.106018] unregister_netdevice: waiting for eth0 to become free. Usage count = 43 [ 244.366187] unregister_netdevice: waiting for eth0 to become free. Usage count = 43 [ 254.626325] unregister_netdevice: waiting for eth0 to become free. Usage count = 43 [ 264.706472] unregister_netdevice: waiting for eth0 to become free. Usage count = 43 [ 274.736619] unregister_netdevice: waiting for eth0 to become free. Usage count = 43 [ 284.766766] unregister_netdevice: waiting for eth0 to become free. Usage count = 43 ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-09 0:01 ` Eric Dumazet @ 2017-05-09 1:22 ` David Miller 2017-05-09 2:18 ` Eric Dumazet 0 siblings, 1 reply; 43+ messages in thread From: David Miller @ 2017-05-09 1:22 UTC (permalink / raw) To: eric.dumazet; +Cc: xiyou.wangcong, netdev, andreyknvl, edumazet From: Eric Dumazet <eric.dumazet@gmail.com> Date: Mon, 08 May 2017 17:01:20 -0700 > On Mon, 2017-05-08 at 14:35 -0400, David Miller wrote: >> From: Cong Wang <xiyou.wangcong@gmail.com> >> Date: Thu, 4 May 2017 14:54:17 -0700 >> >> > IPv4 dst could use fi->fib_metrics to store metrics but fib_info >> > itself is refcnt'ed, so without taking a refcnt fi and >> > fi->fib_metrics could be freed while dst metrics still points to >> > it. This triggers use-after-free as reported by Andrey twice. >> > >> > This patch reverts commit 2860583fe840 ("ipv4: Kill rt->fi") to >> > restore this reference counting. It is a quick fix for -net and >> > -stable, for -net-next, as Eric suggested, we can consider doing >> > reference counting for metrics itself instead of relying on fib_info. >> > >> > IPv6 is very different, it copies or steals the metrics from mx6_config >> > in fib6_commit_metrics() so probably doesn't need a refcnt. >> > >> > Decnet has already done the refcnt'ing, see dn_fib_semantic_match(). >> > >> > Fixes: 2860583fe840 ("ipv4: Kill rt->fi") >> > Reported-by: Andrey Konovalov <andreyknvl@google.com> >> > Tested-by: Andrey Konovalov <andreyknvl@google.com> >> > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> >> >> Applied and queued up for -stable, thanks. > > Although I now have on latest net tree these messages when I reboot my > test machine. > > [ 224.085873] unregister_netdevice: waiting for eth0 to become free. Usage count = 43 Strange, the refcounting looks quite OK in the patch you're quoting. I looked over it a few times and cannot figure out a possible cause there. I am assuming you are quite confident it is this change? ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-09 1:22 ` David Miller @ 2017-05-09 2:18 ` Eric Dumazet 2017-05-09 2:35 ` David Miller 2017-05-09 16:44 ` Cong Wang 0 siblings, 2 replies; 43+ messages in thread From: Eric Dumazet @ 2017-05-09 2:18 UTC (permalink / raw) To: David Miller; +Cc: xiyou.wangcong, netdev, andreyknvl, edumazet On Mon, 2017-05-08 at 21:22 -0400, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Mon, 08 May 2017 17:01:20 -0700 > > > On Mon, 2017-05-08 at 14:35 -0400, David Miller wrote: > >> From: Cong Wang <xiyou.wangcong@gmail.com> > >> Date: Thu, 4 May 2017 14:54:17 -0700 > >> > >> > IPv4 dst could use fi->fib_metrics to store metrics but fib_info > >> > itself is refcnt'ed, so without taking a refcnt fi and > >> > fi->fib_metrics could be freed while dst metrics still points to > >> > it. This triggers use-after-free as reported by Andrey twice. > >> > > >> > This patch reverts commit 2860583fe840 ("ipv4: Kill rt->fi") to > >> > restore this reference counting. It is a quick fix for -net and > >> > -stable, for -net-next, as Eric suggested, we can consider doing > >> > reference counting for metrics itself instead of relying on fib_info. > >> > > >> > IPv6 is very different, it copies or steals the metrics from mx6_config > >> > in fib6_commit_metrics() so probably doesn't need a refcnt. > >> > > >> > Decnet has already done the refcnt'ing, see dn_fib_semantic_match(). > >> > > >> > Fixes: 2860583fe840 ("ipv4: Kill rt->fi") > >> > Reported-by: Andrey Konovalov <andreyknvl@google.com> > >> > Tested-by: Andrey Konovalov <andreyknvl@google.com> > >> > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> > >> > >> Applied and queued up for -stable, thanks. > > > > Although I now have on latest net tree these messages when I reboot my > > test machine. > > > > [ 224.085873] unregister_netdevice: waiting for eth0 to become free. Usage count = 43 > > Strange, the refcounting looks quite OK in the patch you're quoting. > I looked over it a few times and cannot figure out a possible cause > there. > > I am assuming you are quite confident it is this change? At least, reverting the patch resolves the issue for me. Keeping fib (and their reference to netdev) is apparently too much, we probably need to implement a refcount on the metrics themselves, being stand alone objects. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-09 2:18 ` Eric Dumazet @ 2017-05-09 2:35 ` David Miller 2017-05-09 16:44 ` Cong Wang 1 sibling, 0 replies; 43+ messages in thread From: David Miller @ 2017-05-09 2:35 UTC (permalink / raw) To: eric.dumazet; +Cc: xiyou.wangcong, netdev, andreyknvl, edumazet From: Eric Dumazet <eric.dumazet@gmail.com> Date: Mon, 08 May 2017 19:18:22 -0700 > Keeping fib (and their reference to netdev) is apparently too much, > we probably need to implement a refcount on the metrics themselves, > being stand alone objects. I think I see the problem, when we flush the dst cache on device unregister, we point the dst at loopback but do not flush metrics. I'm going to revert. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-09 2:18 ` Eric Dumazet 2017-05-09 2:35 ` David Miller @ 2017-05-09 16:44 ` Cong Wang 2017-05-09 16:56 ` Eric Dumazet 1 sibling, 1 reply; 43+ messages in thread From: Cong Wang @ 2017-05-09 16:44 UTC (permalink / raw) To: Eric Dumazet Cc: David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet On Mon, May 8, 2017 at 7:18 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Mon, 2017-05-08 at 21:22 -0400, David Miller wrote: >> From: Eric Dumazet <eric.dumazet@gmail.com> >> Date: Mon, 08 May 2017 17:01:20 -0700 >> >> > On Mon, 2017-05-08 at 14:35 -0400, David Miller wrote: >> >> From: Cong Wang <xiyou.wangcong@gmail.com> >> >> Date: Thu, 4 May 2017 14:54:17 -0700 >> >> >> >> > IPv4 dst could use fi->fib_metrics to store metrics but fib_info >> >> > itself is refcnt'ed, so without taking a refcnt fi and >> >> > fi->fib_metrics could be freed while dst metrics still points to >> >> > it. This triggers use-after-free as reported by Andrey twice. >> >> > >> >> > This patch reverts commit 2860583fe840 ("ipv4: Kill rt->fi") to >> >> > restore this reference counting. It is a quick fix for -net and >> >> > -stable, for -net-next, as Eric suggested, we can consider doing >> >> > reference counting for metrics itself instead of relying on fib_info. >> >> > >> >> > IPv6 is very different, it copies or steals the metrics from mx6_config >> >> > in fib6_commit_metrics() so probably doesn't need a refcnt. >> >> > >> >> > Decnet has already done the refcnt'ing, see dn_fib_semantic_match(). >> >> > >> >> > Fixes: 2860583fe840 ("ipv4: Kill rt->fi") >> >> > Reported-by: Andrey Konovalov <andreyknvl@google.com> >> >> > Tested-by: Andrey Konovalov <andreyknvl@google.com> >> >> > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> >> >> >> >> Applied and queued up for -stable, thanks. >> > >> > Although I now have on latest net tree these messages when I reboot my >> > test machine. >> > >> > [ 224.085873] unregister_netdevice: waiting for eth0 to become free. Usage count = 43 >> >> Strange, the refcounting looks quite OK in the patch you're quoting. >> I looked over it a few times and cannot figure out a possible cause >> there. Eric, how did you produce it? I guess it's because of nh_dev which is the only netdevice pointer inside fib_info. Let me take a deeper look. >> >> I am assuming you are quite confident it is this change? > > At least, reverting the patch resolves the issue for me. > > Keeping fib (and their reference to netdev) is apparently too much, > we probably need to implement a refcount on the metrics themselves, > being stand alone objects. I don't disagree, just that it may need to change too much code which goes beyond a stable candidate. Thanks for the bug report! ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-09 16:44 ` Cong Wang @ 2017-05-09 16:56 ` Eric Dumazet 2017-05-09 20:56 ` Cong Wang 0 siblings, 1 reply; 43+ messages in thread From: Eric Dumazet @ 2017-05-09 16:56 UTC (permalink / raw) To: Cong Wang Cc: David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet On Tue, 2017-05-09 at 09:44 -0700, Cong Wang wrote: > > Eric, how did you produce it? > I guess it's because of nh_dev which is the only netdevice pointer inside > fib_info. Let me take a deeper look. > Nothing particular, I am using kexec to boot new kernels, and all my attempts with your patch included demonstrated the issue. eth0 is a bonding device, it might matter, I do not know. We also have some tunnels, but unfortunately I can not provide a setup that you could use on say a VM. I can send you the .config if this can help > >> > >> I am assuming you are quite confident it is this change? > > > > At least, reverting the patch resolves the issue for me. > > > > Keeping fib (and their reference to netdev) is apparently too much, > > we probably need to implement a refcount on the metrics themselves, > > being stand alone objects. > > I don't disagree, just that it may need to change too much code which > goes beyond a stable candidate. Well, your choice, but dealing with a full blown fib and its dependencies look fragile to me. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-09 16:56 ` Eric Dumazet @ 2017-05-09 20:56 ` Cong Wang 2017-05-09 22:07 ` Cong Wang 0 siblings, 1 reply; 43+ messages in thread From: Cong Wang @ 2017-05-09 20:56 UTC (permalink / raw) To: Eric Dumazet Cc: David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet On Tue, May 9, 2017 at 9:56 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Tue, 2017-05-09 at 09:44 -0700, Cong Wang wrote: > >> >> Eric, how did you produce it? >> I guess it's because of nh_dev which is the only netdevice pointer inside >> fib_info. Let me take a deeper look. >> > > Nothing particular, I am using kexec to boot new kernels, and all my > attempts with your patch included demonstrated the issue. Interesting. So this happens on your eth0 teardown path, but I don't see any problem in the related NETDEV_UNREGISTER notifiers yet, we don't call dst_destroy() on this path and will defer it to GC, but that should not be delayed for so long? Wait... if we transfer dst->dev to loopback_dev because we don't want to block unregister path, then we might have a similar problem for rt->fi too, fib_info is still referenced by dst, so these nh_dev's still hold the dev references... Something like this (just for proof of concept): diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index da449dd..85cd614 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -205,7 +205,7 @@ static void free_fib_info_rcu(struct rcu_head *head) struct fib_info *fi = container_of(head, struct fib_info, rcu); change_nexthops(fi) { - if (nexthop_nh->nh_dev) + if (nexthop_nh->nh_dev && !(nexthop_nh->nh_flags & RTNH_F_DEAD)) dev_put(nexthop_nh->nh_dev); lwtstate_put(nexthop_nh->nh_lwtstate); free_nh_exceptions(nexthop_nh); @@ -1414,8 +1414,10 @@ int fib_sync_down_dev(struct net_device *dev, unsigned long event, bool force) else if (nexthop_nh->nh_dev == dev && nexthop_nh->nh_scope != scope) { switch (event) { - case NETDEV_DOWN: case NETDEV_UNREGISTER: + dev_put(dev); + /* fall through */ + case NETDEV_DOWN: nexthop_nh->nh_flags |= RTNH_F_DEAD; /* fall through */ case NETDEV_CHANGE: ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-09 20:56 ` Cong Wang @ 2017-05-09 22:07 ` Cong Wang 2017-05-09 22:52 ` Eric Dumazet 0 siblings, 1 reply; 43+ messages in thread From: Cong Wang @ 2017-05-09 22:07 UTC (permalink / raw) To: Eric Dumazet Cc: David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet [-- Attachment #1: Type: text/plain, Size: 397 bytes --] On Tue, May 9, 2017 at 1:56 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote: > Wait... if we transfer dst->dev to loopback_dev because we don't > want to block unregister path, then we might have a similar problem > for rt->fi too, fib_info is still referenced by dst, so these nh_dev's still > hold the dev references... > I finally come up with the attach patch... Do you mind to give it a try? [-- Attachment #2: ipv4-rt-fi-ref-count.diff --] [-- Type: text/plain, Size: 2821 bytes --] commit a431b4d969f617c5ef8711b6bf493199eecec759 Author: Cong Wang <xiyou.wangcong@gmail.com> Date: Tue May 9 14:35:10 2017 -0700 ipv4: restore rt->fi for reference counting, try #2 Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> diff --git a/include/net/route.h b/include/net/route.h index 2cc0e14..4335eb7 100644 --- a/include/net/route.h +++ b/include/net/route.h @@ -69,6 +69,7 @@ struct rtable { struct list_head rt_uncached; struct uncached_list *rt_uncached_list; + struct fib_info *fi; /* for refcnt to shared metrics */ }; static inline bool rt_is_input_route(const struct rtable *rt) diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index da449dd..d77c453 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -1429,8 +1429,15 @@ int fib_sync_down_dev(struct net_device *dev, unsigned long event, bool force) #ifdef CONFIG_IP_ROUTE_MULTIPATH if (event == NETDEV_UNREGISTER && nexthop_nh->nh_dev == dev) { + /* This should be fine, we are on unregister + * path so synchronize_net() already waits for + * existing readers. We have to release the + * dev here because dst could still hold this + * fib_info via rt->fi, we can't wait for GC. + */ + RCU_INIT_POINTER(nexthop_nh->nh_dev, NULL); + dev_put(dev); dead = fi->fib_nhs; - break; } #endif } endfor_nexthops(fi) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 655d9ee..f647310 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1387,6 +1387,11 @@ static void ipv4_dst_destroy(struct dst_entry *dst) { struct rtable *rt = (struct rtable *) dst; + if (rt->fi) { + fib_info_put(rt->fi); + rt->fi = NULL; + } + if (!list_empty(&rt->rt_uncached)) { struct uncached_list *ul = rt->rt_uncached_list; @@ -1424,6 +1429,16 @@ static bool rt_cache_valid(const struct rtable *rt) !rt_is_expired(rt); } +static void rt_init_metrics(struct rtable *rt, struct fib_info *fi) +{ + if (fi->fib_metrics != (u32 *)dst_default_metrics) { + fib_info_hold(fi); + rt->fi = fi; + } + + dst_init_metrics(&rt->dst, fi->fib_metrics, true); +} + static void rt_set_nexthop(struct rtable *rt, __be32 daddr, const struct fib_result *res, struct fib_nh_exception *fnhe, @@ -1438,7 +1453,7 @@ static void rt_set_nexthop(struct rtable *rt, __be32 daddr, rt->rt_gateway = nh->nh_gw; rt->rt_uses_gateway = 1; } - dst_init_metrics(&rt->dst, fi->fib_metrics, true); + rt_init_metrics(rt, fi); #ifdef CONFIG_IP_ROUTE_CLASSID rt->dst.tclassid = nh->nh_tclassid; #endif @@ -1490,6 +1505,7 @@ struct rtable *rt_dst_alloc(struct net_device *dev, rt->rt_gateway = 0; rt->rt_uses_gateway = 0; rt->rt_table_id = 0; + rt->fi = NULL; INIT_LIST_HEAD(&rt->rt_uncached); rt->dst.output = ip_output; ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-09 22:07 ` Cong Wang @ 2017-05-09 22:52 ` Eric Dumazet 2017-05-09 22:54 ` Eric Dumazet 0 siblings, 1 reply; 43+ messages in thread From: Eric Dumazet @ 2017-05-09 22:52 UTC (permalink / raw) To: Cong Wang Cc: David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet On Tue, 2017-05-09 at 15:07 -0700, Cong Wang wrote: > On Tue, May 9, 2017 at 1:56 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote: > > Wait... if we transfer dst->dev to loopback_dev because we don't > > want to block unregister path, then we might have a similar problem > > for rt->fi too, fib_info is still referenced by dst, so these nh_dev's still > > hold the dev references... > > > > I finally come up with the attach patch... Do you mind to give it a try? I will, but this might be delayed by a few hours. In the mean time, it looks like you could try adding the following to your .config ;) CONFIG_IP_ROUTE_MULTIPATH=y ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-09 22:52 ` Eric Dumazet @ 2017-05-09 22:54 ` Eric Dumazet 2017-05-09 23:09 ` Eric Dumazet 0 siblings, 1 reply; 43+ messages in thread From: Eric Dumazet @ 2017-05-09 22:54 UTC (permalink / raw) To: Cong Wang Cc: David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet On Tue, 2017-05-09 at 15:52 -0700, Eric Dumazet wrote: > On Tue, 2017-05-09 at 15:07 -0700, Cong Wang wrote: > > On Tue, May 9, 2017 at 1:56 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote: > > > Wait... if we transfer dst->dev to loopback_dev because we don't > > > want to block unregister path, then we might have a similar problem > > > for rt->fi too, fib_info is still referenced by dst, so these nh_dev's still > > > hold the dev references... > > > > > > > I finally come up with the attach patch... Do you mind to give it a try? > > I will, but this might be delayed by a few hours. > > In the mean time, it looks like you could try adding the following to > your .config ;) > > CONFIG_IP_ROUTE_MULTIPATH=y > > + /* This should be fine, we are on unregister + * path so synchronize_net() already waits for + * existing readers. We have to release the + * dev here because dst could still hold this + * fib_info via rt->fi, we can't wait for GC. + */ + RCU_INIT_POINTER(nexthop_nh->nh_dev, NULL); + dev_put(dev); dead = fi->fib_nhs; dead = fi->fib_mhs looks wrong if you remove the break; statement ? - break; } ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-09 22:54 ` Eric Dumazet @ 2017-05-09 23:09 ` Eric Dumazet 2017-05-09 23:35 ` Cong Wang 0 siblings, 1 reply; 43+ messages in thread From: Eric Dumazet @ 2017-05-09 23:09 UTC (permalink / raw) To: Cong Wang Cc: David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet On Tue, 2017-05-09 at 15:54 -0700, Eric Dumazet wrote: > On Tue, 2017-05-09 at 15:52 -0700, Eric Dumazet wrote: > > On Tue, 2017-05-09 at 15:07 -0700, Cong Wang wrote: > > > On Tue, May 9, 2017 at 1:56 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote: > > > > Wait... if we transfer dst->dev to loopback_dev because we don't > > > > want to block unregister path, then we might have a similar problem > > > > for rt->fi too, fib_info is still referenced by dst, so these nh_dev's still > > > > hold the dev references... > > > > > > > > > > I finally come up with the attach patch... Do you mind to give it a try? > > > > I will, but this might be delayed by a few hours. > > > > In the mean time, it looks like you could try adding the following to > > your .config ;) > > > > CONFIG_IP_ROUTE_MULTIPATH=y > > > > > > + /* This should be fine, we are on unregister > + * path so synchronize_net() already waits for > + * existing readers. We have to release the > + * dev here because dst could still hold this > + * fib_info via rt->fi, we can't wait for GC. > + */ > + RCU_INIT_POINTER(nexthop_nh->nh_dev, NULL); > + dev_put(dev); > dead = fi->fib_nhs; > > dead = fi->fib_mhs looks wrong if you remove the break; statement ? > > - break; Also setting nexthop_nh->nh_dev to NULL looks quite dangerous We have plenty of sites doing : if (fi->fib_dev) x = fi->fib_dev->field fib_route_seq_show() is one example. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-09 23:09 ` Eric Dumazet @ 2017-05-09 23:35 ` Cong Wang 2017-05-09 23:50 ` Eric Dumazet ` (2 more replies) 0 siblings, 3 replies; 43+ messages in thread From: Cong Wang @ 2017-05-09 23:35 UTC (permalink / raw) To: Eric Dumazet Cc: David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet On Tue, May 9, 2017 at 4:09 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Tue, 2017-05-09 at 15:54 -0700, Eric Dumazet wrote: >> On Tue, 2017-05-09 at 15:52 -0700, Eric Dumazet wrote: >> > On Tue, 2017-05-09 at 15:07 -0700, Cong Wang wrote: >> > > On Tue, May 9, 2017 at 1:56 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote: >> > > > Wait... if we transfer dst->dev to loopback_dev because we don't >> > > > want to block unregister path, then we might have a similar problem >> > > > for rt->fi too, fib_info is still referenced by dst, so these nh_dev's still >> > > > hold the dev references... >> > > > >> > > >> > > I finally come up with the attach patch... Do you mind to give it a try? >> > >> > I will, but this might be delayed by a few hours. >> > >> > In the mean time, it looks like you could try adding the following to >> > your .config ;) >> > >> > CONFIG_IP_ROUTE_MULTIPATH=y >> > >> > >> >> + /* This should be fine, we are on unregister >> + * path so synchronize_net() already waits for >> + * existing readers. We have to release the >> + * dev here because dst could still hold this >> + * fib_info via rt->fi, we can't wait for GC. >> + */ >> + RCU_INIT_POINTER(nexthop_nh->nh_dev, NULL); >> + dev_put(dev); >> dead = fi->fib_nhs; >> >> dead = fi->fib_mhs looks wrong if you remove the break; statement ? >> >> - break; This statement is only used to ensure we pass the "dead == fi->fib_nhs" check right below the inner loop, it is fine to keep it without break since fi is not changed in the inner loop. > > Also setting nexthop_nh->nh_dev to NULL looks quite dangerous > > We have plenty of sites doing : > > if (fi->fib_dev) > x = fi->fib_dev->field > > fib_route_seq_show() is one example. > All of them take RCU read lock, so, as I explained in the code comment, they all should be fine because of synchronize_net() on unregister path. Do you see anything otherwise? ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-09 23:35 ` Cong Wang @ 2017-05-09 23:50 ` Eric Dumazet 2017-05-10 16:32 ` Cong Wang 2017-05-09 23:51 ` Eric Dumazet 2017-05-10 7:38 ` Julian Anastasov 2 siblings, 1 reply; 43+ messages in thread From: Eric Dumazet @ 2017-05-09 23:50 UTC (permalink / raw) To: Cong Wang Cc: David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet On Tue, 2017-05-09 at 16:35 -0700, Cong Wang wrote: > All of them take RCU read lock, so, as I explained in the code comment, > they all should be fine because of synchronize_net() on unregister path. > Do you see anything otherwise? They might take rcu lock, but compiler is still allowed to read fi->fib_dev multiple times, and crashes might happen. You will need to audit all code and fix it, using proper rcu_dereference() or similar code ensuring compiler wont do stupid things. Like : diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c index 1201409ba1dcb18ee028003b065410b87bf4a602..ab69517befbb5f300af785fbb20071a3d1086593 100644 --- a/net/ipv4/fib_trie.c +++ b/net/ipv4/fib_trie.c @@ -2666,11 +2666,13 @@ static int fib_route_seq_show(struct seq_file *seq, void *v) seq_setwidth(seq, 127); - if (fi) + if (fi) { + struct net_device *dev = rcu_dereference(fi->fib_dev); + seq_printf(seq, "%s\t%08X\t%08X\t%04X\t%d\t%u\t" "%d\t%08X\t%d\t%u\t%u", - fi->fib_dev ? fi->fib_dev->name : "*", + dev ? dev->name : "*", prefix, fi->fib_nh->nh_gw, flags, 0, 0, fi->fib_priority, @@ -2679,13 +2681,13 @@ static int fib_route_seq_show(struct seq_file *seq, void *v) fi->fib_advmss + 40 : 0), fi->fib_window, fi->fib_rtt >> 3); - else + } else { seq_printf(seq, "*\t%08X\t%08X\t%04X\t%d\t%u\t" "%d\t%08X\t%d\t%u\t%u", prefix, 0, flags, 0, 0, 0, mask, 0, 0, 0); - + } seq_pad(seq, '\n'); } ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-09 23:50 ` Eric Dumazet @ 2017-05-10 16:32 ` Cong Wang 0 siblings, 0 replies; 43+ messages in thread From: Cong Wang @ 2017-05-10 16:32 UTC (permalink / raw) To: Eric Dumazet Cc: David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet On Tue, May 9, 2017 at 4:50 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Tue, 2017-05-09 at 16:35 -0700, Cong Wang wrote: > >> All of them take RCU read lock, so, as I explained in the code comment, >> they all should be fine because of synchronize_net() on unregister path. >> Do you see anything otherwise? > > They might take rcu lock, but compiler is still allowed to read > fi->fib_dev multiple times, and crashes might happen. > > You will need to audit all code and fix it, using proper > rcu_dereference() or similar code ensuring compiler wont do stupid > things. > Point taken. But without my patch, nh_dev is supposed to be protected by RCU too, it is freed in a rcu callback and dereferenced like: struct in_device *in_dev = __in_dev_get_rcu(nh->nh_dev); ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-09 23:35 ` Cong Wang 2017-05-09 23:50 ` Eric Dumazet @ 2017-05-09 23:51 ` Eric Dumazet 2017-05-10 16:40 ` Cong Wang 2017-05-10 7:38 ` Julian Anastasov 2 siblings, 1 reply; 43+ messages in thread From: Eric Dumazet @ 2017-05-09 23:51 UTC (permalink / raw) To: Cong Wang Cc: David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet On Tue, 2017-05-09 at 16:35 -0700, Cong Wang wrote: > This statement is only used to ensure we pass the "dead == fi->fib_nhs" > check right below the inner loop, it is fine to keep it without break since > fi is not changed in the inner loop. > So the dead++ above wont end up with (dead > fi->fib_nhs) ? ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-09 23:51 ` Eric Dumazet @ 2017-05-10 16:40 ` Cong Wang 0 siblings, 0 replies; 43+ messages in thread From: Cong Wang @ 2017-05-10 16:40 UTC (permalink / raw) To: Eric Dumazet Cc: David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet On Tue, May 9, 2017 at 4:51 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Tue, 2017-05-09 at 16:35 -0700, Cong Wang wrote: > >> This statement is only used to ensure we pass the "dead == fi->fib_nhs" >> check right below the inner loop, it is fine to keep it without break since >> fi is not changed in the inner loop. >> > > So the dead++ above wont end up with (dead > fi->fib_nhs) ? Good point, it could happen, we probably need another boolean to address this. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-09 23:35 ` Cong Wang 2017-05-09 23:50 ` Eric Dumazet 2017-05-09 23:51 ` Eric Dumazet @ 2017-05-10 7:38 ` Julian Anastasov 2017-05-10 17:00 ` Cong Wang 2 siblings, 1 reply; 43+ messages in thread From: Julian Anastasov @ 2017-05-10 7:38 UTC (permalink / raw) To: Cong Wang Cc: Eric Dumazet, David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet Hello, On Tue, 9 May 2017, Cong Wang wrote: > > Also setting nexthop_nh->nh_dev to NULL looks quite dangerous > > > > We have plenty of sites doing : > > > > if (fi->fib_dev) > > x = fi->fib_dev->field > > > > fib_route_seq_show() is one example. > > > > All of them take RCU read lock, so, as I explained in the code comment, > they all should be fine because of synchronize_net() on unregister path. > Do you see anything otherwise? During NETDEV_UNREGISTER packets for dev should not be flying but packets for other devs can walk the nexthops for multipath routes. It is the rcu_barrier before NETDEV_UNREGISTER_FINAL that allows nh_dev to be set to NULL during this grace period but there are many places to fix that assume nh_dev!=NULL. But why we leak routes? Is there some place that holds routes without listening for NETDEV_UNREGISTER? On fib_flush the infos are unlinked from trees, so after a grace period packets should not see/hold such infos. If we hold routes somewhere for long time, problem can happen also for routes with single nexthop. Regards ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-10 7:38 ` Julian Anastasov @ 2017-05-10 17:00 ` Cong Wang 2017-05-10 19:51 ` Julian Anastasov 0 siblings, 1 reply; 43+ messages in thread From: Cong Wang @ 2017-05-10 17:00 UTC (permalink / raw) To: Julian Anastasov Cc: Eric Dumazet, David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet On Wed, May 10, 2017 at 12:38 AM, Julian Anastasov <ja@ssi.bg> wrote: > > Hello, > > On Tue, 9 May 2017, Cong Wang wrote: > >> > Also setting nexthop_nh->nh_dev to NULL looks quite dangerous >> > >> > We have plenty of sites doing : >> > >> > if (fi->fib_dev) >> > x = fi->fib_dev->field >> > >> > fib_route_seq_show() is one example. >> > >> >> All of them take RCU read lock, so, as I explained in the code comment, >> they all should be fine because of synchronize_net() on unregister path. >> Do you see anything otherwise? > > During NETDEV_UNREGISTER packets for dev should not > be flying but packets for other devs can walk the nexthops > for multipath routes. It is the rcu_barrier before > NETDEV_UNREGISTER_FINAL that allows nh_dev to be set to NULL > during this grace period but there are many places to fix that > assume nh_dev!=NULL. Excellent point. Unfortunately NETDEV_UNREGISTER_FINAL is not yet captured by the fib event notifier. I think readers are still safe to assume nh_dev!=NULL since we wait for existing readers, readers coming later can't see it any more. Or am I missing anything? > > But why we leak routes? Is there some place that holds > routes without listening for NETDEV_UNREGISTER? On fib_flush > the infos are unlinked from trees, so after a grace period packets > should not see/hold such infos. If we hold routes somewhere for > long time, problem can happen also for routes with single nexthop. My theory is that dst which holds a refcnt to fib_info (of course, after this patch) is moved in GC after NETDEV_UNREGISTER but still referenced somewhere, it therefore holds these nh_dev's, which in turn hold back to the netdevice which is being unregistered, thus Eric saw these refcnt leak warnings. I am not sure if sitting in GC for a long time is problematic or not, but from the code where we transfer dst->dev to loopback_dev, it seems this is expected otherwise no need to transfer? But I don't dig the history though. Thanks. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-10 17:00 ` Cong Wang @ 2017-05-10 19:51 ` Julian Anastasov 2017-05-12 0:07 ` Cong Wang 0 siblings, 1 reply; 43+ messages in thread From: Julian Anastasov @ 2017-05-10 19:51 UTC (permalink / raw) To: Cong Wang Cc: Eric Dumazet, David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet Hello, On Wed, 10 May 2017, Cong Wang wrote: > On Wed, May 10, 2017 at 12:38 AM, Julian Anastasov <ja@ssi.bg> wrote: > > > > During NETDEV_UNREGISTER packets for dev should not > > be flying but packets for other devs can walk the nexthops > > for multipath routes. It is the rcu_barrier before > > NETDEV_UNREGISTER_FINAL that allows nh_dev to be set to NULL > > during this grace period but there are many places to fix that > > assume nh_dev!=NULL. > > Excellent point. Unfortunately NETDEV_UNREGISTER_FINAL is not > yet captured by the fib event notifier. > > I think readers are still safe to assume nh_dev!=NULL since we wait > for existing readers, readers coming later can't see it any more. Or > am I missing anything? It is not safe because other devs can deliver packets that still can see the multipath fib info in the FIB nodes. CPU 1 (unreg dev) CPU 2 (packet to another dev2) NETDEV_DOWN => set RTNH_F_DEAD Now traffic sent via downed dev should stop flush_all_backlogs synchronize_net Now received traffic for dev should stop on all CPUs fib_sync_down_dev nh_dev = NULL fib_table_lookup: boom in __in_dev_get_rcu(nh->nh_dev), all nh_dev dereferences must be checked. If checked, it should be safe to use this nh_dev due to the rcu_barrier in netdev_run_todo. But only the structure can be accessed, traffic should not go via unreged dev. fib_flush: unlink fi from fa_info in fib_table_flush Now other CPUs will not see this fi on lookups (after fib_table_flush) netdev_run_todo rcu_barrier Now other CPUs should not send packets via this fib info NETDEV_UNREGISTER_FINAL Notes: - when NETDEV_DOWN is delivered fib_sync_down_dev sets RTNH_F_DEAD but only for link routes, not for host routes - fib_table_lookup runs without any memory ordering barriers, when CPU 2 delivers packet from different dev it can hit a multipath route with nh_dev set to NULL. Not so often if RTNH_F_DEAD was set early and properly checked before dereferencing nh_dev. > > But why we leak routes? Is there some place that holds > > routes without listening for NETDEV_UNREGISTER? On fib_flush > > the infos are unlinked from trees, so after a grace period packets > > should not see/hold such infos. If we hold routes somewhere for > > long time, problem can happen also for routes with single nexthop. > > My theory is that dst which holds a refcnt to fib_info (of course, after > this patch) is moved in GC after NETDEV_UNREGISTER but still > referenced somewhere, it therefore holds these nh_dev's, which > in turn hold back to the netdevice which is being unregistered, thus > Eric saw these refcnt leak warnings. Oh, well, the sockets can hold cached dst. But if the promise is that rt->fi is used only as reference to metrics we have to find a way to drop the dev references at NETDEV_UNREGISTER time. If you set nh_dev to NULL then all lookups should check it for != NULL. The sockets will not walk NHs via rt->fi, i.e. the route lookups will get valid res.fi from trees, so it may work in this way. Regards ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-10 19:51 ` Julian Anastasov @ 2017-05-12 0:07 ` Cong Wang 2017-05-12 1:22 ` Cong Wang 0 siblings, 1 reply; 43+ messages in thread From: Cong Wang @ 2017-05-12 0:07 UTC (permalink / raw) To: Julian Anastasov Cc: Eric Dumazet, David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet On Wed, May 10, 2017 at 12:51 PM, Julian Anastasov <ja@ssi.bg> wrote: > Oh, well, the sockets can hold cached dst. > But if the promise is that rt->fi is used only as > reference to metrics we have to find a way to drop > the dev references at NETDEV_UNREGISTER time. If you > set nh_dev to NULL then all lookups should check it > for != NULL. The sockets will not walk NHs via rt->fi, > i.e. the route lookups will get valid res.fi from trees, > so it may work in this way. > So, if I understand you correctly it is safe to NULL'ing nh_dev in NETDEV_UNREGISTER_FINAL, right? If still not, how about transfer nh_dev's to loopback_dev too in NETDEV_UNREGISTER? Like we transfer dst->dev. I don't want to touch the fast path to check for NULL, as it will change more code and slow down performance. Thanks. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-12 0:07 ` Cong Wang @ 2017-05-12 1:22 ` Cong Wang 2017-05-12 4:55 ` Eric Dumazet 2017-05-12 6:39 ` Julian Anastasov 0 siblings, 2 replies; 43+ messages in thread From: Cong Wang @ 2017-05-12 1:22 UTC (permalink / raw) To: Julian Anastasov Cc: Eric Dumazet, David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet [-- Attachment #1: Type: text/plain, Size: 502 bytes --] On Thu, May 11, 2017 at 5:07 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote: > So, if I understand you correctly it is safe to NULL'ing > nh_dev in NETDEV_UNREGISTER_FINAL, right? > > If still not, how about transfer nh_dev's to loopback_dev > too in NETDEV_UNREGISTER? Like we transfer dst->dev. > > I don't want to touch the fast path to check for NULL, as > it will change more code and slow down performance. Finally I come up with the attached patch. Please let me know if I still miss anything. [-- Attachment #2: ipv4-rt-fi-ref-count.diff --] [-- Type: text/plain, Size: 4364 bytes --] commit edc38ecab7101487b35fc9152e166a2670467e49 Author: Cong Wang <xiyou.wangcong@gmail.com> Date: Tue May 9 14:35:10 2017 -0700 ipv4: restore rt->fi for reference counting, try #2 Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index 6692c57..0f04f4d 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -392,6 +392,7 @@ int fib_unmerge(struct net *net); /* Exported by fib_semantics.c */ int ip_fib_check_default(__be32 gw, struct net_device *dev); int fib_sync_down_dev(struct net_device *dev, unsigned long event, bool force); +void fib_put_nh_devs(struct net_device *dev); int fib_sync_down_addr(struct net_device *dev, __be32 local); int fib_sync_up(struct net_device *dev, unsigned int nh_flags); diff --git a/include/net/route.h b/include/net/route.h index 2cc0e14..4335eb7 100644 --- a/include/net/route.h +++ b/include/net/route.h @@ -69,6 +69,7 @@ struct rtable { struct list_head rt_uncached; struct uncached_list *rt_uncached_list; + struct fib_info *fi; /* for refcnt to shared metrics */ }; static inline bool rt_is_input_route(const struct rtable *rt) diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 39bd1ed..59b0a1d 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -1178,6 +1178,9 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo fib_disable_ip(dev, event, true); rt_flush_dev(dev); return NOTIFY_DONE; + } else if (event == NETDEV_UNREGISTER_FINAL) { + fib_put_nh_devs(dev); + return NOTIFY_DONE; } in_dev = __in_dev_get_rtnl(dev); diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index da449dd..b85c5bb 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -1453,6 +1453,36 @@ int fib_sync_down_dev(struct net_device *dev, unsigned long event, bool force) return ret; } +/* We have to release these nh_dev here because a dst could still hold a + * fib_info via rt->fi, we can't wait for GC, a socket could hold the dst + * for a long time. + */ +void fib_put_nh_devs(struct net_device *dev) +{ + struct fib_info *prev_fi = NULL; + unsigned int hash = fib_devindex_hashfn(dev->ifindex); + struct hlist_head *head = &fib_info_devhash[hash]; + struct fib_nh *nh; + + hlist_for_each_entry(nh, head, nh_hash) { + struct fib_info *fi = nh->nh_parent; + + if (nh->nh_dev != dev || fi == prev_fi) + continue; + prev_fi = fi; + change_nexthops(fi) { + if (nexthop_nh->nh_dev == dev) { + /* This should be safe, we are on unregister + * path, after synchronize_net() and + * rcu_barrier(), no one could see this. + */ + RCU_INIT_POINTER(nexthop_nh->nh_dev, NULL); + dev_put(dev); + } + } endfor_nexthops(fi) + } +} + /* Must be invoked inside of an RCU protected region. */ static void fib_select_default(const struct flowi4 *flp, struct fib_result *res) { diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 655d9ee..f647310 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1387,6 +1387,11 @@ static void ipv4_dst_destroy(struct dst_entry *dst) { struct rtable *rt = (struct rtable *) dst; + if (rt->fi) { + fib_info_put(rt->fi); + rt->fi = NULL; + } + if (!list_empty(&rt->rt_uncached)) { struct uncached_list *ul = rt->rt_uncached_list; @@ -1424,6 +1429,16 @@ static bool rt_cache_valid(const struct rtable *rt) !rt_is_expired(rt); } +static void rt_init_metrics(struct rtable *rt, struct fib_info *fi) +{ + if (fi->fib_metrics != (u32 *)dst_default_metrics) { + fib_info_hold(fi); + rt->fi = fi; + } + + dst_init_metrics(&rt->dst, fi->fib_metrics, true); +} + static void rt_set_nexthop(struct rtable *rt, __be32 daddr, const struct fib_result *res, struct fib_nh_exception *fnhe, @@ -1438,7 +1453,7 @@ static void rt_set_nexthop(struct rtable *rt, __be32 daddr, rt->rt_gateway = nh->nh_gw; rt->rt_uses_gateway = 1; } - dst_init_metrics(&rt->dst, fi->fib_metrics, true); + rt_init_metrics(rt, fi); #ifdef CONFIG_IP_ROUTE_CLASSID rt->dst.tclassid = nh->nh_tclassid; #endif @@ -1490,6 +1505,7 @@ struct rtable *rt_dst_alloc(struct net_device *dev, rt->rt_gateway = 0; rt->rt_uses_gateway = 0; rt->rt_table_id = 0; + rt->fi = NULL; INIT_LIST_HEAD(&rt->rt_uncached); rt->dst.output = ip_output; ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-12 1:22 ` Cong Wang @ 2017-05-12 4:55 ` Eric Dumazet 2017-05-12 17:49 ` Cong Wang 2017-05-12 6:39 ` Julian Anastasov 1 sibling, 1 reply; 43+ messages in thread From: Eric Dumazet @ 2017-05-12 4:55 UTC (permalink / raw) To: Cong Wang Cc: Julian Anastasov, David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet On Thu, 2017-05-11 at 18:22 -0700, Cong Wang wrote: > On Thu, May 11, 2017 at 5:07 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote: > > So, if I understand you correctly it is safe to NULL'ing > > nh_dev in NETDEV_UNREGISTER_FINAL, right? > > > > If still not, how about transfer nh_dev's to loopback_dev > > too in NETDEV_UNREGISTER? Like we transfer dst->dev. > > > > I don't want to touch the fast path to check for NULL, as > > it will change more code and slow down performance. > > Finally I come up with the attached patch. Please let me know if > I still miss anything. You have not addressed my prior feedback, unless I am mistaken ? fib_route_seq_show() and others do not expect fi->fib_dev suddenly becoming NULL. RCU contract is more complicated than simply adding rcu grace period before delete. Thanks. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-12 4:55 ` Eric Dumazet @ 2017-05-12 17:49 ` Cong Wang 0 siblings, 0 replies; 43+ messages in thread From: Cong Wang @ 2017-05-12 17:49 UTC (permalink / raw) To: Eric Dumazet Cc: Julian Anastasov, David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet On Thu, May 11, 2017 at 9:55 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Thu, 2017-05-11 at 18:22 -0700, Cong Wang wrote: >> On Thu, May 11, 2017 at 5:07 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote: >> > So, if I understand you correctly it is safe to NULL'ing >> > nh_dev in NETDEV_UNREGISTER_FINAL, right? >> > >> > If still not, how about transfer nh_dev's to loopback_dev >> > too in NETDEV_UNREGISTER? Like we transfer dst->dev. >> > >> > I don't want to touch the fast path to check for NULL, as >> > it will change more code and slow down performance. >> >> Finally I come up with the attached patch. Please let me know if >> I still miss anything. > > You have not addressed my prior feedback, unless I am mistaken ? > > fib_route_seq_show() and others do not expect fi->fib_dev suddenly > becoming NULL. > > RCU contract is more complicated than simply adding rcu grace period > before delete. > If you mean the lack of rcu_dereference() in fib_route_seq_show() , yeah, I don't fix it because I think it should be in a separate patch, it is not this patch which introduces RCU to nh_dev as I explained previously. Or you mean anything else? ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-12 1:22 ` Cong Wang 2017-05-12 4:55 ` Eric Dumazet @ 2017-05-12 6:39 ` Julian Anastasov 2017-05-12 17:27 ` Cong Wang 1 sibling, 1 reply; 43+ messages in thread From: Julian Anastasov @ 2017-05-12 6:39 UTC (permalink / raw) To: Cong Wang Cc: Eric Dumazet, David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet Hello, On Thu, 11 May 2017, Cong Wang wrote: > On Thu, May 11, 2017 at 5:07 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote: > > So, if I understand you correctly it is safe to NULL'ing > > nh_dev in NETDEV_UNREGISTER_FINAL, right? > > > > If still not, how about transfer nh_dev's to loopback_dev > > too in NETDEV_UNREGISTER? Like we transfer dst->dev. > > > > I don't want to touch the fast path to check for NULL, as > > it will change more code and slow down performance. > > Finally I come up with the attached patch. Please let me know if > I still miss anything. fib_flush will unlink the FIB infos at NETDEV_UNREGISTER time, so we can not see them in any hash tables later on NETDEV_UNREGISTER_FINAL. fib_put_nh_devs() can not work except if moving NHs in another hash table but that should not be needed. I'm thinking for the following solution which is a bit hackish: - on NETDEV_UNREGISTER we want to put the nh_dev references, so fib_release_info is a good place. But as fib_release_info is not always called, we will have two places that put references. We can use such hack: - for example, use nh_oif to know if dev_put is already called - fib_release_info() should set nh_oif to 0 because it will now call dev_put without clearing nh_dev - free_fib_info_rcu() will not call dev_put if nh_oif is 0: if (nexthop_nh->nh_dev && nexthop_nh->nh_oif) dev_put(nexthop_nh->nh_dev); - as fi is unlinked, there is no chance fib_info_hashfn() to use the cleared nh_oif for hashing - we expect noone to touch nh_dev fields after fi is unlinked, this includes free_fib_info and free_fib_info_rcu What we achieve: - between NETDEV_UNREGISTER and NETDEV_UNREGISTER_FINAL nh_dev still points to our dev (and not to loopback, I think, this answers your question in previous email), so route lookups will not find loopback in the FIB tree. We do not want some packets to be wrongly rerouted. Even if we don't hold dev reference, the RCU grace period helps here. - fib_dev/nh_dev != NULL checks are not needed, so this addresses Eric's concerns. BTW, fib_route_seq_show actually checks for fi->fib_dev, may be not in a safe way (lockless_dereference needed?) but as we don't set nh_dev to NULL this is not needed. What more? What about nh_pcpu_rth_output and nh_rth_input holding routes? We should think about them too. I should think more if nh_oif trick can work for them, may be not because nh_oif is optional... May be we should purge them somehow? Regards ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-12 6:39 ` Julian Anastasov @ 2017-05-12 17:27 ` Cong Wang 2017-05-12 20:58 ` Cong Wang 2017-05-12 21:27 ` Julian Anastasov 0 siblings, 2 replies; 43+ messages in thread From: Cong Wang @ 2017-05-12 17:27 UTC (permalink / raw) To: Julian Anastasov Cc: Eric Dumazet, David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet On Thu, May 11, 2017 at 11:39 PM, Julian Anastasov <ja@ssi.bg> wrote: > > Hello, > > On Thu, 11 May 2017, Cong Wang wrote: > >> On Thu, May 11, 2017 at 5:07 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote: >> > So, if I understand you correctly it is safe to NULL'ing >> > nh_dev in NETDEV_UNREGISTER_FINAL, right? >> > >> > If still not, how about transfer nh_dev's to loopback_dev >> > too in NETDEV_UNREGISTER? Like we transfer dst->dev. >> > >> > I don't want to touch the fast path to check for NULL, as >> > it will change more code and slow down performance. >> >> Finally I come up with the attached patch. Please let me know if >> I still miss anything. > > fib_flush will unlink the FIB infos at NETDEV_UNREGISTER > time, so we can not see them in any hash tables later on > NETDEV_UNREGISTER_FINAL. fib_put_nh_devs() can not work > except if moving NHs in another hash table but that should > not be needed. Ah, I did miss the hlist_del() in fib_table_flush(), so we just need some way to link those fib_nh together for NETDEV_UNREGISTER_FINAL, a linked list should be sufficient, but requires adding list_head to fib_nh. > > I'm thinking for the following solution which > is a bit hackish: > > - on NETDEV_UNREGISTER we want to put the nh_dev references, > so fib_release_info is a good place. But as fib_release_info > is not always called, we will have two places that put > references. We can use such hack: > > - for example, use nh_oif to know if dev_put is > already called > > - fib_release_info() should set nh_oif to 0 because > it will now call dev_put without clearing nh_dev Are you sure we are safe to call dev_put() in fib_release_info() for _all_ paths, especially non-unregister paths? See: commit e49cc0da7283088c5e03d475ffe2fdcb24a6d5b1 Author: Yanmin Zhang <yanmin_zhang@linux.intel.com> Date: Wed May 23 15:39:45 2012 +0000 ipv4: fix the rcu race between free_fib_info and ip_route_output_slow But, hmm, very interesting, I always thought dev_put() triggers a kfree() potentially, but here your suggestion actually leverages the fact that it is merely a pcpu counter decrease, so for unregister path, this is just giving refcnt back, which means it is safe as long as we don't have any parallel unregister? We should because of RTNL. I see why you say this is hackish, really it is. ;) We have to ensure the evil dev_put() is only called on unregister path. > > - free_fib_info_rcu() will not call dev_put if nh_oif is 0: > if (nexthop_nh->nh_dev && nexthop_nh->nh_oif) > dev_put(nexthop_nh->nh_dev); > > - as fi is unlinked, there is no chance fib_info_hashfn() > to use the cleared nh_oif for hashing > > - we expect noone to touch nh_dev fields after fi is > unlinked, this includes free_fib_info and free_fib_info_rcu > > What we achieve: > > - between NETDEV_UNREGISTER and NETDEV_UNREGISTER_FINAL nh_dev > still points to our dev (and not to loopback, I think, this > answers your question in previous email), so route lookups > will not find loopback in the FIB tree. We do not want > some packets to be wrongly rerouted. Even if we don't > hold dev reference, the RCU grace period helps here. Thanks to dev_put() for not calling a netdev_freemem(). ;-) > > - fib_dev/nh_dev != NULL checks are not needed, so this addresses > Eric's concerns. BTW, fib_route_seq_show actually checks > for fi->fib_dev, may be not in a safe way (lockless_dereference > needed?) but as we don't set nh_dev to NULL this is not > needed. > I think Eric was complaining about the lack of rcu_dereference() there. > What more? What about nh_pcpu_rth_output and > nh_rth_input holding routes? We should think about > them too. I should think more if nh_oif trick can work > for them, may be not because nh_oif is optional... > May be we should purge them somehow? > Or maybe don't touch nh_oif but using a new flag in nh_flags?? We only need to know if we should call dev_put() in free_fib_info_rcu(). Again, I am still not sure if it is any better than just putting these fib_nh into a linked list. Thanks. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-12 17:27 ` Cong Wang @ 2017-05-12 20:58 ` Cong Wang 2017-05-12 21:13 ` Cong Wang 2017-05-12 21:27 ` Julian Anastasov 1 sibling, 1 reply; 43+ messages in thread From: Cong Wang @ 2017-05-12 20:58 UTC (permalink / raw) To: Julian Anastasov Cc: Eric Dumazet, David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet On Fri, May 12, 2017 at 10:27 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote: > Or maybe don't touch nh_oif but using a new flag in > nh_flags?? We only need to know if we should call > dev_put() in free_fib_info_rcu(). > > Again, I am still not sure if it is any better than just > putting these fib_nh into a linked list. > I am afraid we have to use a linked list, because either a flag or nh_oif is per nh, but one nh could have multiple different nh_devs... ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-12 20:58 ` Cong Wang @ 2017-05-12 21:13 ` Cong Wang 0 siblings, 0 replies; 43+ messages in thread From: Cong Wang @ 2017-05-12 21:13 UTC (permalink / raw) To: Julian Anastasov Cc: Eric Dumazet, David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet On Fri, May 12, 2017 at 1:58 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote: > On Fri, May 12, 2017 at 10:27 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote: >> Or maybe don't touch nh_oif but using a new flag in >> nh_flags?? We only need to know if we should call >> dev_put() in free_fib_info_rcu(). >> >> Again, I am still not sure if it is any better than just >> putting these fib_nh into a linked list. >> > > I am afraid we have to use a linked list, because either a flag or > nh_oif is per nh, but one nh could have multiple different nh_devs... Scratch that. Each fib_nh has one nh_dev... I am trying to use a flag. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-12 17:27 ` Cong Wang 2017-05-12 20:58 ` Cong Wang @ 2017-05-12 21:27 ` Julian Anastasov 2017-05-15 18:34 ` Cong Wang 1 sibling, 1 reply; 43+ messages in thread From: Julian Anastasov @ 2017-05-12 21:27 UTC (permalink / raw) To: Cong Wang Cc: Eric Dumazet, David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet Hello, On Fri, 12 May 2017, Cong Wang wrote: > On Thu, May 11, 2017 at 11:39 PM, Julian Anastasov <ja@ssi.bg> wrote: > > > > fib_flush will unlink the FIB infos at NETDEV_UNREGISTER > > time, so we can not see them in any hash tables later on > > NETDEV_UNREGISTER_FINAL. fib_put_nh_devs() can not work > > except if moving NHs in another hash table but that should > > not be needed. > > Ah, I did miss the hlist_del() in fib_table_flush(), so we just need some > way to link those fib_nh together for NETDEV_UNREGISTER_FINAL, > a linked list should be sufficient, but requires adding list_head to fib_nh. It could be a fib_info_devhash_dead hash table, similar to fib_info_devhash where we can hash NHs for dead fib infos but then we will need a fib_clntref reference or otherwise last user can drop the fib info before NETDEV_UNREGISTER_FINAL is called. It will add more code in fib_sync_down_dev to know when to call fib_info_hold. But OTOH, it will reduce the work needed for careful release of the references in fib_release_info. So, we have 2 possible cases to consider: 1. route deleted, fib_release_info called, fi held by socket, NETDEV_UNREGISTER can not see the NHs because they are unlinked in fib_release_info. dev unreg delayed for long time... 2. NETDEV_UNREGISTER called first, before the NHs are unlinked. Now the main question: is FIB_LOOKUP_NOREF used everywhere in IPv4? I guess so. If not, it means someone can walk its res->fi NHs which is bad. I think, this will delay the unregistration for long time and we can not solve the problem. If yes, free_fib_info() should not use call_rcu. Instead, fib_release_info() will start RCU callback to drop everything via a common function for fib_release_info and free_fib_info. As result, the last fib_info_put will just need to free fi->fib_metrics and fi. Something like: /* Long living data */ fib_info_destroy: { if (fi->fib_dead == 0) { pr_warn("Freeing alive fib_info %p\n", fi); return; } if (fi->fib_metrics != (u32 *) dst_default_metrics) kfree(fi->fib_metrics); kfree(fi); } /* Do not imply RCU grace period anymore, last user is last */ fib_info_put(): { if (atomic_dec_and_test(&fi->fib_clntref)) fib_info_destroy(fi); } /* Release everything except long living fields (fib_metrics) */ fib_info_release(): { change_nexthops(fi) { if (nexthop_nh->nh_dev) dev_put(nexthop_nh->nh_dev); lwtstate_put(nexthop_nh->nh_lwtstate); free_nh_exceptions(nexthop_nh); rt_fibinfo_free_cpus(nexthop_nh->nh_pcpu_rth_output); rt_fibinfo_free(&nexthop_nh->nh_rth_input); } endfor_nexthops(fi); } /* RCU grace period after unlink */ fib_release_info_rcu(): { struct fib_info *fi = container_of(head, struct fib_info, rcu); fib_info_release(fi); fib_info_put(fi); } /* Called just on error */ free_fib_info(): { fib_info_release(fi); fib_info_put(fi); } fib_release_info(): { ... fi->fib_dead = 1; - fib_info_put(fi); + call_rcu(&fi->rcu, fib_release_info_rcu); } > > I'm thinking for the following solution which > > is a bit hackish: > > > > - on NETDEV_UNREGISTER we want to put the nh_dev references, > > so fib_release_info is a good place. But as fib_release_info > > is not always called, we will have two places that put > > references. We can use such hack: > > > > - for example, use nh_oif to know if dev_put is > > already called > > > > - fib_release_info() should set nh_oif to 0 because > > it will now call dev_put without clearing nh_dev > > Are you sure we are safe to call dev_put() in fib_release_info() > for _all_ paths, especially non-unregister paths? See: Yep, dev_put is safe there... > commit e49cc0da7283088c5e03d475ffe2fdcb24a6d5b1 > Author: Yanmin Zhang <yanmin_zhang@linux.intel.com> > Date: Wed May 23 15:39:45 2012 +0000 > > ipv4: fix the rcu race between free_fib_info and ip_route_output_slow ...as long as we do not set nh_dev to NULL > But, hmm, very interesting, I always thought dev_put() triggers a > kfree() potentially, but here your suggestion actually leverages the fact > that it is merely a pcpu counter decrease, so for unregister path, > this is just giving refcnt back, which means it is safe as long as > we don't have any parallel unregister? We should because of RTNL. Yes, we are in the context of unregistration and there is a rcu_barrier() that prevents device to be destroyed while there are RCU users. Refcnt reaching 0 is not enough to free dev, RCU users must be considered, they do not get reference. > I see why you say this is hackish, really it is. ;) We have to ensure > the evil dev_put() is only called on unregister path. Or after a RCU grace period but not later... > > - fib_dev/nh_dev != NULL checks are not needed, so this addresses > > Eric's concerns. BTW, fib_route_seq_show actually checks > > for fi->fib_dev, may be not in a safe way (lockless_dereference > > needed?) but as we don't set nh_dev to NULL this is not > > needed. > > > > I think Eric was complaining about the lack of rcu_dereference() > there. Not needed if we don't set nh_dev to NULL. > > What more? What about nh_pcpu_rth_output and > > nh_rth_input holding routes? We should think about > > them too. I should think more if nh_oif trick can work > > for them, may be not because nh_oif is optional... > > May be we should purge them somehow? > > > > Or maybe don't touch nh_oif but using a new flag in > nh_flags?? We only need to know if we should call > dev_put() in free_fib_info_rcu(). fib_dead is a better option if we decide to use such solution: 1=not called by fib_release_info, 2=called by fib_release_info. Regards ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-12 21:27 ` Julian Anastasov @ 2017-05-15 18:34 ` Cong Wang 2017-05-15 20:37 ` Julian Anastasov 0 siblings, 1 reply; 43+ messages in thread From: Cong Wang @ 2017-05-15 18:34 UTC (permalink / raw) To: Julian Anastasov Cc: Eric Dumazet, David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet On Fri, May 12, 2017 at 2:27 PM, Julian Anastasov <ja@ssi.bg> wrote: > Now the main question: is FIB_LOOKUP_NOREF used > everywhere in IPv4? I guess so. If not, it means > someone can walk its res->fi NHs which is bad. I think, > this will delay the unregistration for long time and we > can not solve the problem. > > If yes, free_fib_info() should not use call_rcu. > Instead, fib_release_info() will start RCU callback to > drop everything via a common function for fib_release_info > and free_fib_info. As result, the last fib_info_put will > just need to free fi->fib_metrics and fi. Yes it is used. But this is a different problem from the dev refcnt issue, right? I can send a separate patch to address it. >> Are you sure we are safe to call dev_put() in fib_release_info() >> for _all_ paths, especially non-unregister paths? See: > > Yep, dev_put is safe there... > >> commit e49cc0da7283088c5e03d475ffe2fdcb24a6d5b1 >> Author: Yanmin Zhang <yanmin_zhang@linux.intel.com> >> Date: Wed May 23 15:39:45 2012 +0000 >> >> ipv4: fix the rcu race between free_fib_info and ip_route_output_slow > > ...as long as we do not set nh_dev to NULL > OK, fair enough, then I think the best solution here is to move the dev_put() from free_fib_info_rcu() to fib_release_info(), fib_nh is already removed from hash there anyway. diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index da449dd..cb712d1 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -205,8 +205,6 @@ static void free_fib_info_rcu(struct rcu_head *head) struct fib_info *fi = container_of(head, struct fib_info, rcu); change_nexthops(fi) { - if (nexthop_nh->nh_dev) - dev_put(nexthop_nh->nh_dev); lwtstate_put(nexthop_nh->nh_lwtstate); free_nh_exceptions(nexthop_nh); rt_fibinfo_free_cpus(nexthop_nh->nh_pcpu_rth_output); @@ -246,6 +244,14 @@ void fib_release_info(struct fib_info *fi) if (!nexthop_nh->nh_dev) continue; hlist_del(&nexthop_nh->nh_hash); + /* We have to release these nh_dev here because a dst + * could still hold a fib_info via rt->fi, we can't wait + * for GC, a socket could hold the dst for a long time. + * + * This is safe, dev_put() alone does not really free + * the netdevice, we just have to put the refcnt back. + */ + dev_put(nexthop_nh->nh_dev); } endfor_nexthops(fi) fi->fib_dead = 1; fib_info_put(fi); Thanks! ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-15 18:34 ` Cong Wang @ 2017-05-15 20:37 ` Julian Anastasov 2017-05-15 22:13 ` Cong Wang 0 siblings, 1 reply; 43+ messages in thread From: Julian Anastasov @ 2017-05-15 20:37 UTC (permalink / raw) To: Cong Wang Cc: Eric Dumazet, David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet Hello, On Mon, 15 May 2017, Cong Wang wrote: > On Fri, May 12, 2017 at 2:27 PM, Julian Anastasov <ja@ssi.bg> wrote: > > Now the main question: is FIB_LOOKUP_NOREF used > > everywhere in IPv4? I guess so. If not, it means > > someone can walk its res->fi NHs which is bad. I think, > > this will delay the unregistration for long time and we > > can not solve the problem. > > > > If yes, free_fib_info() should not use call_rcu. > > Instead, fib_release_info() will start RCU callback to > > drop everything via a common function for fib_release_info > > and free_fib_info. As result, the last fib_info_put will > > just need to free fi->fib_metrics and fi. > > > Yes it is used. But this is a different problem from the > dev refcnt issue, right? I can send a separate patch to > address it. Any user that does not set FIB_LOOKUP_NOREF will need nh_dev refcounts. The assumption is that the NHs are accessed, who knows, may be even after RCU grace period. As result, we can not use dev_put on NETDEV_UNREGISTER. So, we should check if there are users that do not set FIB_LOOKUP_NOREF, at first look, I don't see such ones for IPv4. > >> Are you sure we are safe to call dev_put() in fib_release_info() > >> for _all_ paths, especially non-unregister paths? See: > > > > Yep, dev_put is safe there... > > > >> commit e49cc0da7283088c5e03d475ffe2fdcb24a6d5b1 > >> Author: Yanmin Zhang <yanmin_zhang@linux.intel.com> > >> Date: Wed May 23 15:39:45 2012 +0000 > >> > >> ipv4: fix the rcu race between free_fib_info and ip_route_output_slow > > > > ...as long as we do not set nh_dev to NULL > > > > OK, fair enough, then I think the best solution here is to move > the dev_put() from free_fib_info_rcu() to fib_release_info(), > fib_nh is already removed from hash there anyway. free_fib_info still needs to put the references, that is the reason for the common fib_info_release() in my example. It happens in fib_create_info() where free_fib_info() is called. The func names in my example can be corrected, if needed. > diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c > index da449dd..cb712d1 100644 > --- a/net/ipv4/fib_semantics.c > +++ b/net/ipv4/fib_semantics.c > @@ -205,8 +205,6 @@ static void free_fib_info_rcu(struct rcu_head *head) > struct fib_info *fi = container_of(head, struct fib_info, rcu); > > change_nexthops(fi) { > - if (nexthop_nh->nh_dev) > - dev_put(nexthop_nh->nh_dev); > lwtstate_put(nexthop_nh->nh_lwtstate); > free_nh_exceptions(nexthop_nh); > rt_fibinfo_free_cpus(nexthop_nh->nh_pcpu_rth_output); > @@ -246,6 +244,14 @@ void fib_release_info(struct fib_info *fi) > if (!nexthop_nh->nh_dev) > continue; > hlist_del(&nexthop_nh->nh_hash); > + /* We have to release these nh_dev here because a dst > + * could still hold a fib_info via rt->fi, we can't wait > + * for GC, a socket could hold the dst for a long time. > + * > + * This is safe, dev_put() alone does not really free > + * the netdevice, we just have to put the refcnt back. > + */ > + dev_put(nexthop_nh->nh_dev); > } endfor_nexthops(fi) > fi->fib_dead = 1; Such solution needs the fib_dead = 1|2 game to know who dropped the nh_dev reference, fib_release_info (2) or fib_create_info (1). You can not remove the dev_put calls from free_fib_info_rcu. > fib_info_put(fi); Regards ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-15 20:37 ` Julian Anastasov @ 2017-05-15 22:13 ` Cong Wang 2017-05-16 7:46 ` Julian Anastasov 0 siblings, 1 reply; 43+ messages in thread From: Cong Wang @ 2017-05-15 22:13 UTC (permalink / raw) To: Julian Anastasov Cc: Eric Dumazet, David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet On Mon, May 15, 2017 at 1:37 PM, Julian Anastasov <ja@ssi.bg> wrote: > Any user that does not set FIB_LOOKUP_NOREF > will need nh_dev refcounts. The assumption is that the > NHs are accessed, who knows, may be even after RCU grace > period. As result, we can not use dev_put on NETDEV_UNREGISTER. > So, we should check if there are users that do not > set FIB_LOOKUP_NOREF, at first look, I don't see such ones > for IPv4. I see, although we do have FIB_LOOKUP_NOREF set all the times, there are other places we hold fib_clntref too, for example mlxsw_sp_router_fib_event_work(), it actually uses nh_dev too... So I am afraid moving dev_put() to fib_release_info() is not a solution here. I have to rethink about it. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-15 22:13 ` Cong Wang @ 2017-05-16 7:46 ` Julian Anastasov 2017-05-16 17:53 ` Cong Wang 0 siblings, 1 reply; 43+ messages in thread From: Julian Anastasov @ 2017-05-16 7:46 UTC (permalink / raw) To: Cong Wang Cc: Eric Dumazet, David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet Hello, On Mon, 15 May 2017, Cong Wang wrote: > On Mon, May 15, 2017 at 1:37 PM, Julian Anastasov <ja@ssi.bg> wrote: > > Any user that does not set FIB_LOOKUP_NOREF > > will need nh_dev refcounts. The assumption is that the > > NHs are accessed, who knows, may be even after RCU grace > > period. As result, we can not use dev_put on NETDEV_UNREGISTER. > > So, we should check if there are users that do not > > set FIB_LOOKUP_NOREF, at first look, I don't see such ones > > for IPv4. > > I see, although we do have FIB_LOOKUP_NOREF set all the times, > there are other places we hold fib_clntref too, for example > mlxsw_sp_router_fib_event_work(), it actually uses nh_dev too... > > So I am afraid moving dev_put() to fib_release_info() is not a solution > here. I have to rethink about it. At first look, they use fib_info_hold() to get fib_clntref reference from places where fib_treeref is not fatally decreased to 0 but later a work is used which finishes the job. I guess, we can convert such places to use just a fib_treeref reference. They can use such new method instead of fib_info_hold: void fib_treeref_get(struct fib_info *fi) { spin_lock_bh(&fib_info_lock); fi->fib_treeref++; spin_unlock_bh(&fib_info_lock); } They will use fib_release_info() to put the reference. But on FIB_EVENT_ENTRY_DEL there is a small window where the scheduled work delays the unlink of fib info from the hash tables, i.e. there is a risk fib_find_info to reuse a dead fib info. May be we can add a fi->fib_flags & RTNH_F_DEAD check there but the problem is that it is set also on NETDEV_DOWN. While attempts to add route to device with !(dev->flags & IFF_UP) is rejected by fib_check_nh(), fib_create_info still can create routes when cfg->fc_scope == RT_SCOPE_HOST. So, RTNH_F_DEAD check in fib_find_info can avoid the reuse of fib info for host routes while device is down but not unregistered. As result, many fib infos can be created instead of one that is reused. Adding new RTNH_F_* flag to properly handle this does not look good... Regards ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-16 7:46 ` Julian Anastasov @ 2017-05-16 17:53 ` Cong Wang 2017-05-16 18:16 ` Cong Wang 0 siblings, 1 reply; 43+ messages in thread From: Cong Wang @ 2017-05-16 17:53 UTC (permalink / raw) To: Julian Anastasov Cc: Eric Dumazet, David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet On Tue, May 16, 2017 at 12:46 AM, Julian Anastasov <ja@ssi.bg> wrote: > > Hello, > > On Mon, 15 May 2017, Cong Wang wrote: > >> On Mon, May 15, 2017 at 1:37 PM, Julian Anastasov <ja@ssi.bg> wrote: >> > Any user that does not set FIB_LOOKUP_NOREF >> > will need nh_dev refcounts. The assumption is that the >> > NHs are accessed, who knows, may be even after RCU grace >> > period. As result, we can not use dev_put on NETDEV_UNREGISTER. >> > So, we should check if there are users that do not >> > set FIB_LOOKUP_NOREF, at first look, I don't see such ones >> > for IPv4. >> >> I see, although we do have FIB_LOOKUP_NOREF set all the times, >> there are other places we hold fib_clntref too, for example >> mlxsw_sp_router_fib_event_work(), it actually uses nh_dev too... >> >> So I am afraid moving dev_put() to fib_release_info() is not a solution >> here. I have to rethink about it. > > At first look, they use fib_info_hold() to get fib_clntref > reference from places where fib_treeref is not fatally decreased > to 0 but later a work is used which finishes the job. I guess, we > can convert such places to use just a fib_treeref reference. > They can use such new method instead of fib_info_hold: > > void fib_treeref_get(struct fib_info *fi) > { > spin_lock_bh(&fib_info_lock); > fi->fib_treeref++; > spin_unlock_bh(&fib_info_lock); > } > > They will use fib_release_info() to put the reference. But > on FIB_EVENT_ENTRY_DEL there is a small window where the > scheduled work delays the unlink of fib info from the > hash tables, i.e. there is a risk fib_find_info to reuse > a dead fib info. > Right, that seems risky. How about adding a check for ->fib_dead in these work's? If the last treeref is gone, probably it is no longer needed to continue to do the offloading work. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [Patch net] ipv4: restore rt->fi for reference counting 2017-05-16 17:53 ` Cong Wang @ 2017-05-16 18:16 ` Cong Wang [not found] ` <1495572267.6465.79.camel@edumazet-glaptop3.roam.corp.google.com> 0 siblings, 1 reply; 43+ messages in thread From: Cong Wang @ 2017-05-16 18:16 UTC (permalink / raw) To: Julian Anastasov Cc: Eric Dumazet, David Miller, Linux Kernel Network Developers, Andrey Konovalov, Eric Dumazet On Tue, May 16, 2017 at 10:53 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote: > How about adding a check for ->fib_dead in these work's? > If the last treeref is gone, probably it is no longer needed > to continue to do the offloading work. Hmm, this doesn't look correct either, we may miss an offloading work if fib_release_info() comes first. ^ permalink raw reply [flat|nested] 43+ messages in thread
[parent not found: <1495572267.6465.79.camel@edumazet-glaptop3.roam.corp.google.com>]
[parent not found: <CAM_iQpX0X3h4Sf+bHUXdJgBqUTxNat0FBT0PeRpLYWju9ci59Q@mail.gmail.com>]
[parent not found: <CANn89i+mPR+7-AVO2Dsd=KfO=COOVY42AKjwEs=0=GUCML6HUQ@mail.gmail.com>]
[parent not found: <CAM_iQpUfLmN3yWsCfpx4ZTptBnuYFNuY5CjBKdwoDpvH5K8P=w@mail.gmail.com>]
[parent not found: <1495665921.6465.95.camel@edumazet-glaptop3.roam.corp.google.com>]
* [PATCH net] ipv4: add reference counting to metrics [not found] ` <1495665921.6465.95.camel@edumazet-glaptop3.roam.corp.google.com> @ 2017-05-25 21:27 ` Eric Dumazet 2017-05-25 22:25 ` Julian Anastasov ` (2 more replies) 0 siblings, 3 replies; 43+ messages in thread From: Eric Dumazet @ 2017-05-25 21:27 UTC (permalink / raw) To: David Miller Cc: Eric Dumazet, Andrey Konovalov, Julian Anastasov, Cong Wang, netdev From: Eric Dumazet <edumazet@google.com> Andrey Konovalov reported crashes in ipv4_mtu() I could reproduce the issue with KASAN kernels, between 10.246.7.151 and 10.246.7.152 : 1) 20 concurrent netperf -t TCP_RR -H 10.246.7.152 -l 1000 & 2) At the same time run following loop : while : do ip ro add 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500 ip ro del 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500 done Cong Wang attempted to add back rt->fi in commit 82486aa6f1b9 ("ipv4: restore rt->fi for reference counting") but this proved to add some issues that were complex to solve. Instead, I suggested to add a refcount to the metrics themselves, being a standalone object (in particular, no reference to other objects) I tried to make this patch as small as possible to ease its backport, instead of being super clean. Note that we believe that only ipv4 dst need to take care of the metric refcount. But if this is wrong, this patch adds the basic infrastructure to extend this to other families. Many thanks to Julian Anastasov for reviewing this patch, and Cong Wang for his efforts on this problem. Fixes: 2860583fe840 ("ipv4: Kill rt->fi") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> --- include/net/dst.h | 8 +++++++- include/net/ip_fib.h | 10 +++++----- net/core/dst.c | 23 ++++++++++++++--------- net/ipv4/fib_semantics.c | 17 ++++++++++------- net/ipv4/route.c | 10 +++++++++- 5 files changed, 45 insertions(+), 23 deletions(-) diff --git a/include/net/dst.h b/include/net/dst.h index 049af33da3b6c95897d544670cea65c542317673..cfc0437841665d7ed46a714915c50d723c24901c 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -107,10 +107,16 @@ struct dst_entry { }; }; +struct dst_metrics { + u32 metrics[RTAX_MAX]; + atomic_t refcnt; +}; +extern const struct dst_metrics dst_default_metrics; + u32 *dst_cow_metrics_generic(struct dst_entry *dst, unsigned long old); -extern const u32 dst_default_metrics[]; #define DST_METRICS_READ_ONLY 0x1UL +#define DST_METRICS_REFCOUNTED 0x2UL #define DST_METRICS_FLAGS 0x3UL #define __DST_METRICS_PTR(Y) \ ((u32 *)((Y) & ~DST_METRICS_FLAGS)) diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index 6692c5758b332d468f1e0611ecc4f3e03ae03b2b..f7f6aa789c6174c41ca9739206d586c559c1f3a1 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -114,11 +114,11 @@ struct fib_info { __be32 fib_prefsrc; u32 fib_tb_id; u32 fib_priority; - u32 *fib_metrics; -#define fib_mtu fib_metrics[RTAX_MTU-1] -#define fib_window fib_metrics[RTAX_WINDOW-1] -#define fib_rtt fib_metrics[RTAX_RTT-1] -#define fib_advmss fib_metrics[RTAX_ADVMSS-1] + struct dst_metrics *fib_metrics; +#define fib_mtu fib_metrics->metrics[RTAX_MTU-1] +#define fib_window fib_metrics->metrics[RTAX_WINDOW-1] +#define fib_rtt fib_metrics->metrics[RTAX_RTT-1] +#define fib_advmss fib_metrics->metrics[RTAX_ADVMSS-1] int fib_nhs; #ifdef CONFIG_IP_ROUTE_MULTIPATH int fib_weight; diff --git a/net/core/dst.c b/net/core/dst.c index 960e503b5a529a2c4f1866f49c150493ee98d7da..6192f11beec9077de964e2aeff4f78547f08b8da 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -151,13 +151,13 @@ int dst_discard_out(struct net *net, struct sock *sk, struct sk_buff *skb) } EXPORT_SYMBOL(dst_discard_out); -const u32 dst_default_metrics[RTAX_MAX + 1] = { +const struct dst_metrics dst_default_metrics = { /* This initializer is needed to force linker to place this variable * into const section. Otherwise it might end into bss section. * We really want to avoid false sharing on this variable, and catch * any writes on it. */ - [RTAX_MAX] = 0xdeadbeef, + .refcnt = ATOMIC_INIT(1), }; void dst_init(struct dst_entry *dst, struct dst_ops *ops, @@ -169,7 +169,7 @@ void dst_init(struct dst_entry *dst, struct dst_ops *ops, if (dev) dev_hold(dev); dst->ops = ops; - dst_init_metrics(dst, dst_default_metrics, true); + dst_init_metrics(dst, dst_default_metrics.metrics, true); dst->expires = 0UL; dst->path = dst; dst->from = NULL; @@ -314,25 +314,30 @@ EXPORT_SYMBOL(dst_release); u32 *dst_cow_metrics_generic(struct dst_entry *dst, unsigned long old) { - u32 *p = kmalloc(sizeof(u32) * RTAX_MAX, GFP_ATOMIC); + struct dst_metrics *p = kmalloc(sizeof(*p), GFP_ATOMIC); if (p) { - u32 *old_p = __DST_METRICS_PTR(old); + struct dst_metrics *old_p = (struct dst_metrics *)__DST_METRICS_PTR(old); unsigned long prev, new; - memcpy(p, old_p, sizeof(u32) * RTAX_MAX); + atomic_set(&p->refcnt, 1); + memcpy(p->metrics, old_p->metrics, sizeof(p->metrics)); new = (unsigned long) p; prev = cmpxchg(&dst->_metrics, old, new); if (prev != old) { kfree(p); - p = __DST_METRICS_PTR(prev); + p = (struct dst_metrics *)__DST_METRICS_PTR(prev); if (prev & DST_METRICS_READ_ONLY) p = NULL; + } else if (prev & DST_METRICS_REFCOUNTED) { + if (atomic_dec_and_test(&old_p->refcnt)) + kfree(old_p); } } - return p; + BUILD_BUG_ON(offsetof(struct dst_metrics, metrics) != 0); + return (u32 *)p; } EXPORT_SYMBOL(dst_cow_metrics_generic); @@ -341,7 +346,7 @@ void __dst_destroy_metrics_generic(struct dst_entry *dst, unsigned long old) { unsigned long prev, new; - new = ((unsigned long) dst_default_metrics) | DST_METRICS_READ_ONLY; + new = ((unsigned long) &dst_default_metrics) | DST_METRICS_READ_ONLY; prev = cmpxchg(&dst->_metrics, old, new); if (prev == old) kfree(__DST_METRICS_PTR(old)); diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index da449ddb8cc172bd9091c00057a69a095f98b56d..ad9ad4aab5da7c7d11c3b80edbdfcbdd3d7153fe 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -203,6 +203,7 @@ static void rt_fibinfo_free_cpus(struct rtable __rcu * __percpu *rtp) static void free_fib_info_rcu(struct rcu_head *head) { struct fib_info *fi = container_of(head, struct fib_info, rcu); + struct dst_metrics *m; change_nexthops(fi) { if (nexthop_nh->nh_dev) @@ -213,8 +214,9 @@ static void free_fib_info_rcu(struct rcu_head *head) rt_fibinfo_free(&nexthop_nh->nh_rth_input); } endfor_nexthops(fi); - if (fi->fib_metrics != (u32 *) dst_default_metrics) - kfree(fi->fib_metrics); + m = fi->fib_metrics; + if (m != &dst_default_metrics && atomic_dec_and_test(&m->refcnt)) + kfree(m); kfree(fi); } @@ -971,11 +973,11 @@ fib_convert_metrics(struct fib_info *fi, const struct fib_config *cfg) val = 255; if (type == RTAX_FEATURES && (val & ~RTAX_FEATURE_MASK)) return -EINVAL; - fi->fib_metrics[type - 1] = val; + fi->fib_metrics->metrics[type - 1] = val; } if (ecn_ca) - fi->fib_metrics[RTAX_FEATURES - 1] |= DST_FEATURE_ECN_CA; + fi->fib_metrics->metrics[RTAX_FEATURES - 1] |= DST_FEATURE_ECN_CA; return 0; } @@ -1033,11 +1035,12 @@ struct fib_info *fib_create_info(struct fib_config *cfg) goto failure; fib_info_cnt++; if (cfg->fc_mx) { - fi->fib_metrics = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL); + fi->fib_metrics = kzalloc(sizeof(*fi->fib_metrics), GFP_KERNEL); if (!fi->fib_metrics) goto failure; + atomic_set(&fi->fib_metrics->refcnt, 1); } else - fi->fib_metrics = (u32 *) dst_default_metrics; + fi->fib_metrics = (struct dst_metrics *)&dst_default_metrics; fi->fib_net = net; fi->fib_protocol = cfg->fc_protocol; @@ -1238,7 +1241,7 @@ int fib_dump_info(struct sk_buff *skb, u32 portid, u32 seq, int event, if (fi->fib_priority && nla_put_u32(skb, RTA_PRIORITY, fi->fib_priority)) goto nla_put_failure; - if (rtnetlink_put_metrics(skb, fi->fib_metrics) < 0) + if (rtnetlink_put_metrics(skb, fi->fib_metrics->metrics) < 0) goto nla_put_failure; if (fi->fib_prefsrc && diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 655d9eebe43e16a59102edcd3ea4bc177c6b341d..6883b3d4ba8f69de2cb924612d60f5671a219a84 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1385,8 +1385,12 @@ static void rt_add_uncached_list(struct rtable *rt) static void ipv4_dst_destroy(struct dst_entry *dst) { + struct dst_metrics *p = (struct dst_metrics *)DST_METRICS_PTR(dst); struct rtable *rt = (struct rtable *) dst; + if (p != &dst_default_metrics && atomic_dec_and_test(&p->refcnt)) + kfree(p); + if (!list_empty(&rt->rt_uncached)) { struct uncached_list *ul = rt->rt_uncached_list; @@ -1438,7 +1442,11 @@ static void rt_set_nexthop(struct rtable *rt, __be32 daddr, rt->rt_gateway = nh->nh_gw; rt->rt_uses_gateway = 1; } - dst_init_metrics(&rt->dst, fi->fib_metrics, true); + dst_init_metrics(&rt->dst, fi->fib_metrics->metrics, true); + if (fi->fib_metrics != &dst_default_metrics) { + rt->dst._metrics |= DST_METRICS_REFCOUNTED; + atomic_inc(&fi->fib_metrics->refcnt); + } #ifdef CONFIG_IP_ROUTE_CLASSID rt->dst.tclassid = nh->nh_tclassid; #endif ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH net] ipv4: add reference counting to metrics 2017-05-25 21:27 ` [PATCH net] ipv4: add reference counting to metrics Eric Dumazet @ 2017-05-25 22:25 ` Julian Anastasov 2017-05-26 17:08 ` Cong Wang 2017-05-26 18:58 ` David Miller 2 siblings, 0 replies; 43+ messages in thread From: Julian Anastasov @ 2017-05-25 22:25 UTC (permalink / raw) To: Eric Dumazet Cc: David Miller, Eric Dumazet, Andrey Konovalov, Cong Wang, netdev Hello, On Thu, 25 May 2017, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > Andrey Konovalov reported crashes in ipv4_mtu() > > I could reproduce the issue with KASAN kernels, between > 10.246.7.151 and 10.246.7.152 : > > 1) 20 concurrent netperf -t TCP_RR -H 10.246.7.152 -l 1000 & > > 2) At the same time run following loop : > while : > do > ip ro add 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500 > ip ro del 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500 > done > > > Cong Wang attempted to add back rt->fi in commit > 82486aa6f1b9 ("ipv4: restore rt->fi for reference counting") > but this proved to add some issues that were complex to solve. > > Instead, I suggested to add a refcount to the metrics themselves, > being a standalone object (in particular, no reference to other objects) > > I tried to make this patch as small as possible to ease its backport, > instead of being super clean. Note that we believe that only ipv4 dst > need to take care of the metric refcount. But if this is wrong, > this patch adds the basic infrastructure to extend this to other > families. > > Many thanks to Julian Anastasov for reviewing this patch, and Cong Wang > for his efforts on this problem. > > Fixes: 2860583fe840 ("ipv4: Kill rt->fi") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Reported-by: Andrey Konovalov <andreyknvl@google.com> Nice work, thanks! Reviewed-by: Julian Anastasov <ja@ssi.bg> > --- > include/net/dst.h | 8 +++++++- > include/net/ip_fib.h | 10 +++++----- > net/core/dst.c | 23 ++++++++++++++--------- > net/ipv4/fib_semantics.c | 17 ++++++++++------- > net/ipv4/route.c | 10 +++++++++- > 5 files changed, 45 insertions(+), 23 deletions(-) > > diff --git a/include/net/dst.h b/include/net/dst.h > index 049af33da3b6c95897d544670cea65c542317673..cfc0437841665d7ed46a714915c50d723c24901c 100644 > --- a/include/net/dst.h > +++ b/include/net/dst.h > @@ -107,10 +107,16 @@ struct dst_entry { > }; > }; > > +struct dst_metrics { > + u32 metrics[RTAX_MAX]; > + atomic_t refcnt; > +}; > +extern const struct dst_metrics dst_default_metrics; > + > u32 *dst_cow_metrics_generic(struct dst_entry *dst, unsigned long old); > -extern const u32 dst_default_metrics[]; > > #define DST_METRICS_READ_ONLY 0x1UL > +#define DST_METRICS_REFCOUNTED 0x2UL > #define DST_METRICS_FLAGS 0x3UL > #define __DST_METRICS_PTR(Y) \ > ((u32 *)((Y) & ~DST_METRICS_FLAGS)) > diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h > index 6692c5758b332d468f1e0611ecc4f3e03ae03b2b..f7f6aa789c6174c41ca9739206d586c559c1f3a1 100644 > --- a/include/net/ip_fib.h > +++ b/include/net/ip_fib.h > @@ -114,11 +114,11 @@ struct fib_info { > __be32 fib_prefsrc; > u32 fib_tb_id; > u32 fib_priority; > - u32 *fib_metrics; > -#define fib_mtu fib_metrics[RTAX_MTU-1] > -#define fib_window fib_metrics[RTAX_WINDOW-1] > -#define fib_rtt fib_metrics[RTAX_RTT-1] > -#define fib_advmss fib_metrics[RTAX_ADVMSS-1] > + struct dst_metrics *fib_metrics; > +#define fib_mtu fib_metrics->metrics[RTAX_MTU-1] > +#define fib_window fib_metrics->metrics[RTAX_WINDOW-1] > +#define fib_rtt fib_metrics->metrics[RTAX_RTT-1] > +#define fib_advmss fib_metrics->metrics[RTAX_ADVMSS-1] > int fib_nhs; > #ifdef CONFIG_IP_ROUTE_MULTIPATH > int fib_weight; > diff --git a/net/core/dst.c b/net/core/dst.c > index 960e503b5a529a2c4f1866f49c150493ee98d7da..6192f11beec9077de964e2aeff4f78547f08b8da 100644 > --- a/net/core/dst.c > +++ b/net/core/dst.c > @@ -151,13 +151,13 @@ int dst_discard_out(struct net *net, struct sock *sk, struct sk_buff *skb) > } > EXPORT_SYMBOL(dst_discard_out); > > -const u32 dst_default_metrics[RTAX_MAX + 1] = { > +const struct dst_metrics dst_default_metrics = { > /* This initializer is needed to force linker to place this variable > * into const section. Otherwise it might end into bss section. > * We really want to avoid false sharing on this variable, and catch > * any writes on it. > */ > - [RTAX_MAX] = 0xdeadbeef, > + .refcnt = ATOMIC_INIT(1), > }; > > void dst_init(struct dst_entry *dst, struct dst_ops *ops, > @@ -169,7 +169,7 @@ void dst_init(struct dst_entry *dst, struct dst_ops *ops, > if (dev) > dev_hold(dev); > dst->ops = ops; > - dst_init_metrics(dst, dst_default_metrics, true); > + dst_init_metrics(dst, dst_default_metrics.metrics, true); > dst->expires = 0UL; > dst->path = dst; > dst->from = NULL; > @@ -314,25 +314,30 @@ EXPORT_SYMBOL(dst_release); > > u32 *dst_cow_metrics_generic(struct dst_entry *dst, unsigned long old) > { > - u32 *p = kmalloc(sizeof(u32) * RTAX_MAX, GFP_ATOMIC); > + struct dst_metrics *p = kmalloc(sizeof(*p), GFP_ATOMIC); > > if (p) { > - u32 *old_p = __DST_METRICS_PTR(old); > + struct dst_metrics *old_p = (struct dst_metrics *)__DST_METRICS_PTR(old); > unsigned long prev, new; > > - memcpy(p, old_p, sizeof(u32) * RTAX_MAX); > + atomic_set(&p->refcnt, 1); > + memcpy(p->metrics, old_p->metrics, sizeof(p->metrics)); > > new = (unsigned long) p; > prev = cmpxchg(&dst->_metrics, old, new); > > if (prev != old) { > kfree(p); > - p = __DST_METRICS_PTR(prev); > + p = (struct dst_metrics *)__DST_METRICS_PTR(prev); > if (prev & DST_METRICS_READ_ONLY) > p = NULL; > + } else if (prev & DST_METRICS_REFCOUNTED) { > + if (atomic_dec_and_test(&old_p->refcnt)) > + kfree(old_p); > } > } > - return p; > + BUILD_BUG_ON(offsetof(struct dst_metrics, metrics) != 0); > + return (u32 *)p; > } > EXPORT_SYMBOL(dst_cow_metrics_generic); > > @@ -341,7 +346,7 @@ void __dst_destroy_metrics_generic(struct dst_entry *dst, unsigned long old) > { > unsigned long prev, new; > > - new = ((unsigned long) dst_default_metrics) | DST_METRICS_READ_ONLY; > + new = ((unsigned long) &dst_default_metrics) | DST_METRICS_READ_ONLY; > prev = cmpxchg(&dst->_metrics, old, new); > if (prev == old) > kfree(__DST_METRICS_PTR(old)); > diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c > index da449ddb8cc172bd9091c00057a69a095f98b56d..ad9ad4aab5da7c7d11c3b80edbdfcbdd3d7153fe 100644 > --- a/net/ipv4/fib_semantics.c > +++ b/net/ipv4/fib_semantics.c > @@ -203,6 +203,7 @@ static void rt_fibinfo_free_cpus(struct rtable __rcu * __percpu *rtp) > static void free_fib_info_rcu(struct rcu_head *head) > { > struct fib_info *fi = container_of(head, struct fib_info, rcu); > + struct dst_metrics *m; > > change_nexthops(fi) { > if (nexthop_nh->nh_dev) > @@ -213,8 +214,9 @@ static void free_fib_info_rcu(struct rcu_head *head) > rt_fibinfo_free(&nexthop_nh->nh_rth_input); > } endfor_nexthops(fi); > > - if (fi->fib_metrics != (u32 *) dst_default_metrics) > - kfree(fi->fib_metrics); > + m = fi->fib_metrics; > + if (m != &dst_default_metrics && atomic_dec_and_test(&m->refcnt)) > + kfree(m); > kfree(fi); > } > > @@ -971,11 +973,11 @@ fib_convert_metrics(struct fib_info *fi, const struct fib_config *cfg) > val = 255; > if (type == RTAX_FEATURES && (val & ~RTAX_FEATURE_MASK)) > return -EINVAL; > - fi->fib_metrics[type - 1] = val; > + fi->fib_metrics->metrics[type - 1] = val; > } > > if (ecn_ca) > - fi->fib_metrics[RTAX_FEATURES - 1] |= DST_FEATURE_ECN_CA; > + fi->fib_metrics->metrics[RTAX_FEATURES - 1] |= DST_FEATURE_ECN_CA; > > return 0; > } > @@ -1033,11 +1035,12 @@ struct fib_info *fib_create_info(struct fib_config *cfg) > goto failure; > fib_info_cnt++; > if (cfg->fc_mx) { > - fi->fib_metrics = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL); > + fi->fib_metrics = kzalloc(sizeof(*fi->fib_metrics), GFP_KERNEL); > if (!fi->fib_metrics) > goto failure; > + atomic_set(&fi->fib_metrics->refcnt, 1); > } else > - fi->fib_metrics = (u32 *) dst_default_metrics; > + fi->fib_metrics = (struct dst_metrics *)&dst_default_metrics; > > fi->fib_net = net; > fi->fib_protocol = cfg->fc_protocol; > @@ -1238,7 +1241,7 @@ int fib_dump_info(struct sk_buff *skb, u32 portid, u32 seq, int event, > if (fi->fib_priority && > nla_put_u32(skb, RTA_PRIORITY, fi->fib_priority)) > goto nla_put_failure; > - if (rtnetlink_put_metrics(skb, fi->fib_metrics) < 0) > + if (rtnetlink_put_metrics(skb, fi->fib_metrics->metrics) < 0) > goto nla_put_failure; > > if (fi->fib_prefsrc && > diff --git a/net/ipv4/route.c b/net/ipv4/route.c > index 655d9eebe43e16a59102edcd3ea4bc177c6b341d..6883b3d4ba8f69de2cb924612d60f5671a219a84 100644 > --- a/net/ipv4/route.c > +++ b/net/ipv4/route.c > @@ -1385,8 +1385,12 @@ static void rt_add_uncached_list(struct rtable *rt) > > static void ipv4_dst_destroy(struct dst_entry *dst) > { > + struct dst_metrics *p = (struct dst_metrics *)DST_METRICS_PTR(dst); > struct rtable *rt = (struct rtable *) dst; > > + if (p != &dst_default_metrics && atomic_dec_and_test(&p->refcnt)) > + kfree(p); > + > if (!list_empty(&rt->rt_uncached)) { > struct uncached_list *ul = rt->rt_uncached_list; > > @@ -1438,7 +1442,11 @@ static void rt_set_nexthop(struct rtable *rt, __be32 daddr, > rt->rt_gateway = nh->nh_gw; > rt->rt_uses_gateway = 1; > } > - dst_init_metrics(&rt->dst, fi->fib_metrics, true); > + dst_init_metrics(&rt->dst, fi->fib_metrics->metrics, true); > + if (fi->fib_metrics != &dst_default_metrics) { > + rt->dst._metrics |= DST_METRICS_REFCOUNTED; > + atomic_inc(&fi->fib_metrics->refcnt); > + } > #ifdef CONFIG_IP_ROUTE_CLASSID > rt->dst.tclassid = nh->nh_tclassid; > #endif Regards ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH net] ipv4: add reference counting to metrics 2017-05-25 21:27 ` [PATCH net] ipv4: add reference counting to metrics Eric Dumazet 2017-05-25 22:25 ` Julian Anastasov @ 2017-05-26 17:08 ` Cong Wang 2017-05-26 17:13 ` Eric Dumazet 2017-05-26 18:58 ` David Miller 2 siblings, 1 reply; 43+ messages in thread From: Cong Wang @ 2017-05-26 17:08 UTC (permalink / raw) To: Eric Dumazet Cc: David Miller, Eric Dumazet, Andrey Konovalov, Julian Anastasov, netdev On Thu, May 25, 2017 at 2:27 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > From: Eric Dumazet <edumazet@google.com> > > Andrey Konovalov reported crashes in ipv4_mtu() > > I could reproduce the issue with KASAN kernels, between > 10.246.7.151 and 10.246.7.152 : > > 1) 20 concurrent netperf -t TCP_RR -H 10.246.7.152 -l 1000 & > > 2) At the same time run following loop : > while : > do > ip ro add 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500 > ip ro del 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500 > done > > > Cong Wang attempted to add back rt->fi in commit > 82486aa6f1b9 ("ipv4: restore rt->fi for reference counting") > but this proved to add some issues that were complex to solve. > > Instead, I suggested to add a refcount to the metrics themselves, > being a standalone object (in particular, no reference to other objects) > > I tried to make this patch as small as possible to ease its backport, > instead of being super clean. Note that we believe that only ipv4 dst > need to take care of the metric refcount. But if this is wrong, > this patch adds the basic infrastructure to extend this to other > families. > > Many thanks to Julian Anastasov for reviewing this patch, and Cong Wang > for his efforts on this problem. > > Fixes: 2860583fe840 ("ipv4: Kill rt->fi") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Reported-by: Andrey Konovalov <andreyknvl@google.com> Nice work!! Thanks for your effort of making it small! Just one nit below. > -const u32 dst_default_metrics[RTAX_MAX + 1] = { > +const struct dst_metrics dst_default_metrics = { > /* This initializer is needed to force linker to place this variable > * into const section. Otherwise it might end into bss section. > * We really want to avoid false sharing on this variable, and catch > * any writes on it. > */ > - [RTAX_MAX] = 0xdeadbeef, > + .refcnt = ATOMIC_INIT(1), > }; The code comment above is no longer needed since we have to initialize refcnt to 1, instead of merely for const section. Anyway, Acked-by: Cong Wang <xiyou.wangcong@gmail.com> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH net] ipv4: add reference counting to metrics 2017-05-26 17:08 ` Cong Wang @ 2017-05-26 17:13 ` Eric Dumazet 2017-05-26 17:26 ` Cong Wang 0 siblings, 1 reply; 43+ messages in thread From: Eric Dumazet @ 2017-05-26 17:13 UTC (permalink / raw) To: Cong Wang Cc: Eric Dumazet, David Miller, Andrey Konovalov, Julian Anastasov, netdev On Fri, May 26, 2017 at 10:08 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote: > On Thu, May 25, 2017 at 2:27 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Just one nit below. > >> -const u32 dst_default_metrics[RTAX_MAX + 1] = { >> +const struct dst_metrics dst_default_metrics = { >> /* This initializer is needed to force linker to place this variable >> * into const section. Otherwise it might end into bss section. >> * We really want to avoid false sharing on this variable, and catch >> * any writes on it. >> */ >> - [RTAX_MAX] = 0xdeadbeef, >> + .refcnt = ATOMIC_INIT(1), >> }; > > The code comment above is no longer needed since > we have to initialize refcnt to 1, instead of merely for const > section. I believe the comment is still needed, because normally we make sure dst_default_metrics.refcnt is never touched (incremened nor decremented) So its value should not really matter ? I found that ATOMIC_INIT(1) was less ugly than the 0xdeadbeef ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH net] ipv4: add reference counting to metrics 2017-05-26 17:13 ` Eric Dumazet @ 2017-05-26 17:26 ` Cong Wang 0 siblings, 0 replies; 43+ messages in thread From: Cong Wang @ 2017-05-26 17:26 UTC (permalink / raw) To: Eric Dumazet Cc: Eric Dumazet, David Miller, Andrey Konovalov, Julian Anastasov, netdev On Fri, May 26, 2017 at 10:13 AM, Eric Dumazet <edumazet@google.com> wrote: > On Fri, May 26, 2017 at 10:08 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote: >> On Thu, May 25, 2017 at 2:27 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > >> Just one nit below. >> >>> -const u32 dst_default_metrics[RTAX_MAX + 1] = { >>> +const struct dst_metrics dst_default_metrics = { >>> /* This initializer is needed to force linker to place this variable >>> * into const section. Otherwise it might end into bss section. >>> * We really want to avoid false sharing on this variable, and catch >>> * any writes on it. >>> */ >>> - [RTAX_MAX] = 0xdeadbeef, >>> + .refcnt = ATOMIC_INIT(1), >>> }; >> >> The code comment above is no longer needed since >> we have to initialize refcnt to 1, instead of merely for const >> section. > > I believe the comment is still needed, because normally we make sure > dst_default_metrics.refcnt is never touched (incremened nor > decremented) > > So its value should not really matter ? Oh, you're right, we always have the check on != &dst_default_metrics. > > I found that ATOMIC_INIT(1) was less ugly than the 0xdeadbeef Of course. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH net] ipv4: add reference counting to metrics 2017-05-25 21:27 ` [PATCH net] ipv4: add reference counting to metrics Eric Dumazet 2017-05-25 22:25 ` Julian Anastasov 2017-05-26 17:08 ` Cong Wang @ 2017-05-26 18:58 ` David Miller 2 siblings, 0 replies; 43+ messages in thread From: David Miller @ 2017-05-26 18:58 UTC (permalink / raw) To: eric.dumazet; +Cc: edumazet, andreyknvl, ja, xiyou.wangcong, netdev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 25 May 2017 14:27:35 -0700 > From: Eric Dumazet <edumazet@google.com> > > Andrey Konovalov reported crashes in ipv4_mtu() > > I could reproduce the issue with KASAN kernels, between > 10.246.7.151 and 10.246.7.152 : > > 1) 20 concurrent netperf -t TCP_RR -H 10.246.7.152 -l 1000 & > > 2) At the same time run following loop : > while : > do > ip ro add 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500 > ip ro del 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500 > done > > > Cong Wang attempted to add back rt->fi in commit > 82486aa6f1b9 ("ipv4: restore rt->fi for reference counting") > but this proved to add some issues that were complex to solve. > > Instead, I suggested to add a refcount to the metrics themselves, > being a standalone object (in particular, no reference to other objects) > > I tried to make this patch as small as possible to ease its backport, > instead of being super clean. Note that we believe that only ipv4 dst > need to take care of the metric refcount. But if this is wrong, > this patch adds the basic infrastructure to extend this to other > families. > > Many thanks to Julian Anastasov for reviewing this patch, and Cong Wang > for his efforts on this problem. > > Fixes: 2860583fe840 ("ipv4: Kill rt->fi") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Reported-by: Andrey Konovalov <andreyknvl@google.com> Applied, thanks everyone for following through on this bug fix. And sorry for introducing the problem in the first place :) ^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2017-05-26 18:58 UTC | newest] Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-05-04 21:54 [Patch net] ipv4: restore rt->fi for reference counting Cong Wang 2017-05-07 18:53 ` Eric Dumazet 2017-05-08 18:35 ` David Miller 2017-05-09 0:01 ` Eric Dumazet 2017-05-09 1:22 ` David Miller 2017-05-09 2:18 ` Eric Dumazet 2017-05-09 2:35 ` David Miller 2017-05-09 16:44 ` Cong Wang 2017-05-09 16:56 ` Eric Dumazet 2017-05-09 20:56 ` Cong Wang 2017-05-09 22:07 ` Cong Wang 2017-05-09 22:52 ` Eric Dumazet 2017-05-09 22:54 ` Eric Dumazet 2017-05-09 23:09 ` Eric Dumazet 2017-05-09 23:35 ` Cong Wang 2017-05-09 23:50 ` Eric Dumazet 2017-05-10 16:32 ` Cong Wang 2017-05-09 23:51 ` Eric Dumazet 2017-05-10 16:40 ` Cong Wang 2017-05-10 7:38 ` Julian Anastasov 2017-05-10 17:00 ` Cong Wang 2017-05-10 19:51 ` Julian Anastasov 2017-05-12 0:07 ` Cong Wang 2017-05-12 1:22 ` Cong Wang 2017-05-12 4:55 ` Eric Dumazet 2017-05-12 17:49 ` Cong Wang 2017-05-12 6:39 ` Julian Anastasov 2017-05-12 17:27 ` Cong Wang 2017-05-12 20:58 ` Cong Wang 2017-05-12 21:13 ` Cong Wang 2017-05-12 21:27 ` Julian Anastasov 2017-05-15 18:34 ` Cong Wang 2017-05-15 20:37 ` Julian Anastasov 2017-05-15 22:13 ` Cong Wang 2017-05-16 7:46 ` Julian Anastasov 2017-05-16 17:53 ` Cong Wang 2017-05-16 18:16 ` Cong Wang [not found] ` <1495572267.6465.79.camel@edumazet-glaptop3.roam.corp.google.com> [not found] ` <CAM_iQpX0X3h4Sf+bHUXdJgBqUTxNat0FBT0PeRpLYWju9ci59Q@mail.gmail.com> [not found] ` <CANn89i+mPR+7-AVO2Dsd=KfO=COOVY42AKjwEs=0=GUCML6HUQ@mail.gmail.com> [not found] ` <CAM_iQpUfLmN3yWsCfpx4ZTptBnuYFNuY5CjBKdwoDpvH5K8P=w@mail.gmail.com> [not found] ` <1495665921.6465.95.camel@edumazet-glaptop3.roam.corp.google.com> 2017-05-25 21:27 ` [PATCH net] ipv4: add reference counting to metrics Eric Dumazet 2017-05-25 22:25 ` Julian Anastasov 2017-05-26 17:08 ` Cong Wang 2017-05-26 17:13 ` Eric Dumazet 2017-05-26 17:26 ` Cong Wang 2017-05-26 18:58 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).