Message ID | 20060729043325.GA7035@gondor.apana.org.au |
---|---|
State | New, archived |
Headers | show |
Series |
|
Related | show |
From: Herbert Xu <herbert@gondor.apana.org.au> Date: Sat, 29 Jul 2006 14:33:25 +1000 > [IPV6]: Audit all ip6_dst_lookup/ip6_dst_store calls > > The current users of ip6_dst_lookup can be divided into two classes: > > 1) The caller holds no locks and is in user-context (UDP). > 2) The caller does not want to lookup the dst cache at all. > > The second class covers everyone except UDP because most people do > the cache lookup directly before calling ip6_dst_lookup. This patch > adds ip6_sk_dst_lookup for the first class. > > Similarly ip6_dst_store users can be divded into those that need to > take the socket dst lock and those that don't. This patch adds > __ip6_dst_store for those (everyone except UDP/datagram) that don't > need an extra lock. > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Applied, thanks Herbert. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Sun, Jul 30, 2006 at 03:44:16PM -0700, David Miller wrote: > From: Herbert Xu <herbert@gondor.apana.org.au> > Date: Sat, 29 Jul 2006 14:33:25 +1000 > > > [IPV6]: Audit all ip6_dst_lookup/ip6_dst_store calls > > > > The current users of ip6_dst_lookup can be divided into two classes: > > > > 1) The caller holds no locks and is in user-context (UDP). > > 2) The caller does not want to lookup the dst cache at all. > > > > The second class covers everyone except UDP because most people do > > the cache lookup directly before calling ip6_dst_lookup. This patch > > adds ip6_sk_dst_lookup for the first class. > > > > Similarly ip6_dst_store users can be divded into those that need to > > take the socket dst lock and those that don't. This patch adds > > __ip6_dst_store for those (everyone except UDP/datagram) that don't > > need an extra lock. > > > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> > > Applied, thanks Herbert. I applied this on 2.6.18-rc3, and it panics immediately as the first IPv6 TCP (ssh) session is initiated to the system.
From: Matt Domsch <Matt_Domsch@dell.com> Date: Sun, 30 Jul 2006 22:32:10 -0500 > swapper/0 is trying to acquire lock: > (slock-AF_INET6){-+..}, at: [<ffffffff80414fda>] sk_clone+0xd2/0x3a8 > > but task is already holding lock: > (slock-AF_INET6){-+..}, at: [<ffffffff883d71a8>] tcp_v6_rcv+0x30e/0x76e [ipv6] The tcp_v6_rcv() code path is holding the slock on the parent listening socket, whereas in sk_clone() we have grabbed the socket slock on the newly allocated "newsk" socket. These are thus two seperate locks, although they are in the same locking class "slock-AF_INET6". I don't know why the lock validator is complaining about this. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Sun, Jul 30, 2006 at 10:32:10PM -0500, Matt Domsch wrote: > On Sun, Jul 30, 2006 at 03:44:16PM -0700, David Miller wrote: > > From: Herbert Xu <herbert@gondor.apana.org.au> > > Date: Sat, 29 Jul 2006 14:33:25 +1000 > > > > > [IPV6]: Audit all ip6_dst_lookup/ip6_dst_store calls > > > > > > The current users of ip6_dst_lookup can be divided into two classes: > > > > > > 1) The caller holds no locks and is in user-context (UDP). > > > 2) The caller does not want to lookup the dst cache at all. > > > > > > The second class covers everyone except UDP because most people do > > > the cache lookup directly before calling ip6_dst_lookup. This patch > > > adds ip6_sk_dst_lookup for the first class. > > > > > > Similarly ip6_dst_store users can be divded into those that need to > > > take the socket dst lock and those that don't. This patch adds > > > __ip6_dst_store for those (everyone except UDP/datagram) that don't > > > need an extra lock. > > > > > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> > > > > Applied, thanks Herbert. > > I applied this on 2.6.18-rc3, and it panics immediately as the first > IPv6 TCP (ssh) session is initiated to the system. I backed out this patch, and get a somewhat different failure then with 2.6.18-rc3 plain. The oops repeat continuously with <EOI> after each. Again, this occurs as soon as I start an IPv6 ssh session.
On Sun, Jul 30, 2006 at 10:32:10PM -0500, Matt Domsch wrote: > > I applied this on 2.6.18-rc3, and it panics immediately as the first > IPv6 TCP (ssh) session is initiated to the system. Executive summary: 1) We resolved one lockdep warning only to stumble onto another lockdep validator bug. 2) There is something broken in the x86_64 unwind code which is causing it to panic just about everytime somebody calls dump_stack(). Andi, this is the second time I've seen a report where an otherwise harmless dump_stack call (the other one was caused by a WARN_ON) gets turned into a panic by the stack unwind code on x86_64. This particular report is with 2.6.18-rc3 so it looks like whatever bug is causing it hasn't been fixed yet. Could you please have a look at it? Thanks. > ============================================= > [ INFO: possible recursive locking detected ] > --------------------------------------------- > swapper/0 is trying to acquire lock: > (slock-AF_INET6){-+..}, at: [<ffffffff80414fda>] sk_clone+0xd2/0x3a8 > > but task is already holding lock: > (slock-AF_INET6){-+..}, at: [<ffffffff883d71a8>] tcp_v6_rcv+0x30e/0x76e [ipv6] > > other info that might help us debug this: > 1 lock held by swapper/0: > #0: (slock-AF_INET6){-+..}, at: [<ffffffff883d71a8>] tcp_v6_rcv+0x30e/0x76e [ipv6] > > stack backtrace: > > Call Trace: > [<ffffffff8026f861>] show_trace+0xae/0x30e > [<ffffffff8026fad6>] dump_stack+0x15/0x17 > [<ffffffff802a73d4>] __lock_acquire+0x12e/0xa18 > [<ffffffff802a8232>] lock_acquire+0x4b/0x69 > [<ffffffff8026883b>] _spin_lock+0x25/0x31 > [<ffffffff80414fda>] sk_clone+0xd2/0x3a8 > [<ffffffff8043c8a7>] inet_csk_clone+0x11/0x6f > [<ffffffff80445615>] tcp_create_openreq_child+0x24/0x49c > [<ffffffff883d5d85>] :ipv6:tcp_v6_syn_recv_sock+0x2c5/0x6be > [<ffffffff80445c5e>] tcp_check_req+0x1d1/0x326 > [<ffffffff883d4f0e>] :ipv6:tcp_v6_do_rcv+0x15d/0x372 > [<ffffffff883d75b9>] :ipv6:tcp_v6_rcv+0x71f/0x76e > [<ffffffff883ba49f>] :ipv6:ip6_input+0x223/0x315 > [<ffffffff883bab4d>] :ipv6:ipv6_rcv+0x254/0x2af > [<ffffffff80221883>] netif_receive_skb+0x260/0x2dd > [<ffffffff88101292>] :e1000:e1000_clean_rx_irq+0x423/0x4c2 > [<ffffffff880ff752>] :e1000:e1000_clean+0x88/0x17d > [<ffffffff8020caed>] net_rx_action+0xac/0x1d1 > [<ffffffff80212809>] __do_softirq+0x68/0xf5 > [<ffffffff802626fa>] call_softirq+0x1e/0x28 Now let's move onto the lockdep validator bug :) Ingo/Arjen, I thought we've already fixed this before but somehow I can't find anything in the email archives so perhaps I'm mixing it up with another recursive lock false-positive. The problem here is really quite simple: when we accept a TCP connection there are two sockets involved. The listening socket which existed before the connection came in and the socket we construct for the newly arrived connection. The code works something like this: * Take slock on listening socket. * Construct child socket. * Take slock on child socket. As we never do the locking in the opposite direction (child followed by listening socket) this is safe. So perhaps we need to add some extra annotation in sk_clone? Cheers,
On Mon, 31 Jul 2006 19:04:33 +1000 Herbert Xu <herbert@gondor.apana.org.au> wrote: > 2) There is something broken in the x86_64 unwind code which is causing > it to panic just about everytime somebody calls dump_stack(). > > Andi, this is the second time I've seen a report where an otherwise > harmless dump_stack call (the other one was caused by a WARN_ON) gets > turned into a panic by the stack unwind code on x86_64. This particular > report is with 2.6.18-rc3 so it looks like whatever bug is causing it > hasn't been fixed yet. > > Could you please have a look at it? Thanks. Jan thinks this might have been fixed by a patch which he sent Andi a couple of days ago. Andi has sent that patch to Linus but I'm not sure which patch it was and I'm not sure whether it has been merged into mainline. But yes, -rc3 unwind has problems. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Tuesday 01 August 2006 16:41, Andrew Morton wrote: > On Mon, 31 Jul 2006 19:04:33 +1000 > Herbert Xu <herbert@gondor.apana.org.au> wrote: > > > 2) There is something broken in the x86_64 unwind code which is causing > > it to panic just about everytime somebody calls dump_stack(). > > > > Andi, this is the second time I've seen a report where an otherwise > > harmless dump_stack call (the other one was caused by a WARN_ON) gets > > turned into a panic by the stack unwind code on x86_64. This particular > > report is with 2.6.18-rc3 so it looks like whatever bug is causing it > > hasn't been fixed yet. > > > > Could you please have a look at it? Thanks. > > Jan thinks this might have been fixed by a patch which he sent Andi a > couple of days ago. Andi has sent that patch to Linus I didn't send that particular patch before, just queued it, because I didn't realize that particular crash, but I have now send it yesterday. So far L. hasn't merged it unfortunately, but I will resend. > but I'm not sure > which patch it was "entry-more-unwind" was my version, there was another one from Jan > and I'm not sure whether it has been merged into > mainline. > > But yes, -rc3 unwind has problems. "unwinder stuck" messages are expected and not really fatal because they don't lose any information. I expect it will need some releases to work them all out fully, but then we'll hopefully have a much better unwinder that doesn't generate any false positives anymore. New crashes during unwinding are fatal though and I plan to fix them. So far this one was the only known one. I already got a lot of patches queued for .19 that fix more unwind information in a lot of assembly files. Still not fully complete though. Fixing it all properly unfortunately requires undoing some stuff, e.g. the unwinder cannot deal with separate lock sections, so I was slowly removing them. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h index ab29daf..96b0e66 100644 --- a/include/net/ip6_route.h +++ b/include/net/ip6_route.h @@ -139,16 +139,22 @@ extern rwlock_t rt6_lock; /* * Store a destination cache entry in a socket */ -static inline void ip6_dst_store(struct sock *sk, struct dst_entry *dst, - struct in6_addr *daddr) +static inline void __ip6_dst_store(struct sock *sk, struct dst_entry *dst, + struct in6_addr *daddr) { struct ipv6_pinfo *np = inet6_sk(sk); struct rt6_info *rt = (struct rt6_info *) dst; - write_lock(&sk->sk_dst_lock); sk_setup_caps(sk, dst); np->daddr_cache = daddr; np->dst_cookie = rt->rt6i_node ? rt->rt6i_node->fn_sernum : 0; +} + +static inline void ip6_dst_store(struct sock *sk, struct dst_entry *dst, + struct in6_addr *daddr) +{ + write_lock(&sk->sk_dst_lock); + __ip6_dst_store(sk, dst, daddr); write_unlock(&sk->sk_dst_lock); } diff --git a/include/net/ipv6.h b/include/net/ipv6.h index a8fdf79..ece7e8a 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -468,6 +468,9 @@ extern void ip6_flush_pending_frames(s extern int ip6_dst_lookup(struct sock *sk, struct dst_entry **dst, struct flowi *fl); +extern int ip6_sk_dst_lookup(struct sock *sk, + struct dst_entry **dst, + struct flowi *fl); /* * skb processing functions diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index 9f3d4d7..610c722 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -230,7 +230,7 @@ static int dccp_v6_connect(struct sock * ipv6_addr_copy(&np->saddr, saddr); inet->rcv_saddr = LOOPBACK4_IPV6; - ip6_dst_store(sk, dst, NULL); + __ip6_dst_store(sk, dst, NULL); icsk->icsk_ext_hdr_len = 0; if (np->opt != NULL) @@ -863,7 +863,7 @@ static struct sock *dccp_v6_request_recv * comment in that function for the gory details. -acme */ - ip6_dst_store(newsk, dst, NULL); + __ip6_dst_store(newsk, dst, NULL); newsk->sk_route_caps = dst->dev->features & ~(NETIF_F_IP_CSUM | NETIF_F_TSO); newdp6 = (struct dccp6_sock *)newsk; diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 5a0ba58..ac85e9c 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -658,7 +658,7 @@ int inet6_sk_rebuild_header(struct sock return err; } - ip6_dst_store(sk, dst, NULL); + __ip6_dst_store(sk, dst, NULL); } return 0; diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c index 5c950cc..bf49107 100644 --- a/net/ipv6/inet6_connection_sock.c +++ b/net/ipv6/inet6_connection_sock.c @@ -185,7 +185,7 @@ int inet6_csk_xmit(struct sk_buff *skb, return err; } - ip6_dst_store(sk, dst, NULL); + __ip6_dst_store(sk, dst, NULL); } skb->dst = dst_clone(dst); diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 3bc74ce..5e74a37 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -723,48 +723,51 @@ fail: return err; } -int ip6_dst_lookup(struct sock *sk, struct dst_entry **dst, struct flowi *fl) +static struct dst_entry *ip6_sk_dst_check(struct sock *sk, + struct dst_entry *dst, + struct flowi *fl) { - int err = 0; + struct ipv6_pinfo *np = inet6_sk(sk); + struct rt6_info *rt = (struct rt6_info *)dst; - *dst = NULL; - if (sk) { - struct ipv6_pinfo *np = inet6_sk(sk); - - *dst = sk_dst_check(sk, np->dst_cookie); - if (*dst) { - struct rt6_info *rt = (struct rt6_info*)*dst; - - /* Yes, checking route validity in not connected - * case is not very simple. Take into account, - * that we do not support routing by source, TOS, - * and MSG_DONTROUTE --ANK (980726) - * - * 1. If route was host route, check that - * cached destination is current. - * If it is network route, we still may - * check its validity using saved pointer - * to the last used address: daddr_cache. - * We do not want to save whole address now, - * (because main consumer of this service - * is tcp, which has not this problem), - * so that the last trick works only on connected - * sockets. - * 2. oif also should be the same. - */ - if (((rt->rt6i_dst.plen != 128 || - !ipv6_addr_equal(&fl->fl6_dst, - &rt->rt6i_dst.addr)) - && (np->daddr_cache == NULL || - !ipv6_addr_equal(&fl->fl6_dst, - np->daddr_cache))) - || (fl->oif && fl->oif != (*dst)->dev->ifindex)) { - dst_release(*dst); - *dst = NULL; - } - } + if (!dst) + goto out; + + /* Yes, checking route validity in not connected + * case is not very simple. Take into account, + * that we do not support routing by source, TOS, + * and MSG_DONTROUTE --ANK (980726) + * + * 1. If route was host route, check that + * cached destination is current. + * If it is network route, we still may + * check its validity using saved pointer + * to the last used address: daddr_cache. + * We do not want to save whole address now, + * (because main consumer of this service + * is tcp, which has not this problem), + * so that the last trick works only on connected + * sockets. + * 2. oif also should be the same. + */ + if (((rt->rt6i_dst.plen != 128 || + !ipv6_addr_equal(&fl->fl6_dst, &rt->rt6i_dst.addr)) + && (np->daddr_cache == NULL || + !ipv6_addr_equal(&fl->fl6_dst, np->daddr_cache))) + || (fl->oif && fl->oif != dst->dev->ifindex)) { + dst_release(dst); + dst = NULL; } +out: + return dst; +} + +static int ip6_dst_lookup_tail(struct sock *sk, + struct dst_entry **dst, struct flowi *fl) +{ + int err; + if (*dst == NULL) *dst = ip6_route_output(sk, fl); @@ -773,7 +776,6 @@ int ip6_dst_lookup(struct sock *sk, stru if (ipv6_addr_any(&fl->fl6_src)) { err = ipv6_get_saddr(*dst, &fl->fl6_dst, &fl->fl6_src); - if (err) goto out_err_release; } @@ -786,8 +788,48 @@ out_err_release: return err; } +/** + * ip6_dst_lookup - perform route lookup on flow + * @sk: socket which provides route info + * @dst: pointer to dst_entry * for result + * @fl: flow to lookup + * + * This function performs a route lookup on the given flow. + * + * It returns zero on success, or a standard errno code on error. + */ +int ip6_dst_lookup(struct sock *sk, struct dst_entry **dst, struct flowi *fl) +{ + *dst = NULL; + return ip6_dst_lookup_tail(sk, dst, fl); +} EXPORT_SYMBOL_GPL(ip6_dst_lookup); +/** + * ip6_sk_dst_lookup - perform socket cached route lookup on flow + * @sk: socket which provides the dst cache and route info + * @dst: pointer to dst_entry * for result + * @fl: flow to lookup + * + * This function performs a route lookup on the given flow with the + * possibility of using the cached route in the socket if it is valid. + * It will take the socket dst lock when operating on the dst cache. + * As a result, this function can only be used in process context. + * + * It returns zero on success, or a standard errno code on error. + */ +int ip6_sk_dst_lookup(struct sock *sk, struct dst_entry **dst, struct flowi *fl) +{ + *dst = NULL; + if (sk) { + *dst = sk_dst_check(sk, inet6_sk(sk)->dst_cookie); + *dst = ip6_sk_dst_check(sk, *dst, fl); + } + + return ip6_dst_lookup_tail(sk, dst, fl); +} +EXPORT_SYMBOL_GPL(ip6_sk_dst_lookup); + static inline int ip6_ufo_append_data(struct sock *sk, int getfrag(void *from, char *to, int offset, int len, int odd, struct sk_buff *skb), diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 923989d..b76fd7f 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -270,7 +270,7 @@ static int tcp_v6_connect(struct sock *s inet->rcv_saddr = LOOPBACK4_IPV6; sk->sk_gso_type = SKB_GSO_TCPV6; - ip6_dst_store(sk, dst, NULL); + __ip6_dst_store(sk, dst, NULL); icsk->icsk_ext_hdr_len = 0; if (np->opt) @@ -947,7 +947,7 @@ static struct sock * tcp_v6_syn_recv_soc */ sk->sk_gso_type = SKB_GSO_TCPV6; - ip6_dst_store(newsk, dst, NULL); + __ip6_dst_store(newsk, dst, NULL); newtcp6sk = (struct tcp6_sock *)newsk; inet_sk(newsk)->pinet6 = &newtcp6sk->inet6; diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index ccc57f4..3d54f24 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -782,7 +782,7 @@ do_udp_sendmsg: connected = 0; } - err = ip6_dst_lookup(sk, &dst, fl); + err = ip6_sk_dst_lookup(sk, &dst, fl); if (err) goto out; if (final_p)