* [PATCH]: tipc: Fix oops on send prior to entering networked mode @ 2010-02-19 19:40 Neil Horman 2010-02-22 22:44 ` Stephens, Allan 2010-03-02 18:33 ` [PATCH]: tipc: Fix oops on send prior to entering networked mode (v2) Neil Horman 0 siblings, 2 replies; 28+ messages in thread From: Neil Horman @ 2010-02-19 19:40 UTC (permalink / raw) To: netdev Cc: per.liden, jon.maloy, allan.stephens, tipc-discussion, davem, nhorman Fix TIPC to disallow sending to remote addresses prior to entering NET_MODE user programs can oops the kernel by sending datagrams via AF_TIPC prior to entering networked mode. The following backtrace has been observed: ID: 13459 TASK: ffff810014640040 CPU: 0 COMMAND: "tipc-client" #0 [ffff81002d9a5810] crash_kexec at ffffffff800ac5b9 #1 [ffff81002d9a58d0] __die at ffffffff80065127 #2 [ffff81002d9a5910] do_page_fault at ffffffff80066da7 #3 [ffff81002d9a5a00] error_exit at ffffffff8005dde9 [exception RIP: tipc_node_select_next_hop+90] RIP: ffffffff8869d3c3 RSP: ffff81002d9a5ab8 RFLAGS: 00010202 RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000001001001 RBP: 0000000001001001 R8: 0074736575716552 R9: 0000000000000000 R10: ffff81003fbd0680 R11: 00000000000000c8 R12: 0000000000000008 R13: 0000000000000001 R14: 0000000000000001 R15: ffff810015c6ca00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #4 [ffff81002d9a5ab0] tipc_node_select_next_hop at ffffffff8869d3b1 #5 [ffff81002d9a5ae0] tipc_link_send_sections_fast at ffffffff88698558 #6 [ffff81002d9a5be0] tipc_forward2port at ffffffff8869eb1d #7 [ffff81002d9a5c10] tipc_send2port at ffffffff8869eb79 #8 [ffff81002d9a5c30] send_msg at ffffffff886a1d0b #9 [ffff81002d9a5cb0] sock_sendmsg at ffffffff80055261 RIP: 0000003cbd8d49a3 RSP: 00007fffc84e0be8 RFLAGS: 00010206 RAX: 000000000000002c RBX: ffffffff8005d116 RCX: 0000000000000000 RDX: 0000000000000008 RSI: 00007fffc84e0c00 RDI: 0000000000000003 RBP: 0000000000000000 R8: 00007fffc84e0c10 R9: 0000000000000010 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fffc84e0d10 R14: 0000000000000000 R15: 00007fffc84e0c30 ORIG_RAX: 000000000000002c CS: 0033 SS: 002b What happens is that, when the tipc module in inserted it enters a standalone node mode in which communication to its own address is allowed <0.0.0> but not to other addresses, since the appropriate data structures have not been allocated yet (specifically the tipc_net pointer). There is nothing stopping a client from trying to send such a message however, and if that happens, we attempt to dereference tipc_net.zones while the pointer is still NULL, and explode. The fix is to add a check at the start of the send_msg path to ensure that we've allocated the tipc_net pointers and entered networked mode prior to allowing a send to any destination other than our loopback address. This patch has received minimal testing, but fixes the issue. Through reviews are appreciated. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Per Liden <per.liden@ericsson.com> CC: Jon Maloy <jon.maloy@ericsson.com> CC: Allan Stephens <allan.stephens@windriver.com> CC: David S. Miller <davem@davemloft.net> CC: Neil Horman <nhorman@tuxdriver.com> CC: tipc-discussion@lists.sourceforge.net net.c | 2 +- socket.c | 4 ++++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/net/tipc/net.c b/net/tipc/net.c index 7906608..512b33c 100644 --- a/net/tipc/net.c +++ b/net/tipc/net.c @@ -278,7 +278,6 @@ int tipc_net_start(u32 addr) tipc_cfg_stop(); tipc_own_addr = addr; - tipc_mode = TIPC_NET_MODE; tipc_named_reinit(); tipc_port_reinit(); @@ -289,6 +288,7 @@ int tipc_net_start(u32 addr) return res; } + tipc_mode = TIPC_NET_MODE; tipc_k_signal((Handler)tipc_subscr_start, 0); tipc_k_signal((Handler)tipc_cfg_init, 0); diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 1ea64f0..45229fd 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -526,6 +526,10 @@ static int send_msg(struct kiocb *iocb, struct socket *sock, if (iocb) lock_sock(sk); + if ((tipc_mode != TIPC_NET_MODE) && + (dest->addr.name.domain != tipc_own_addr)) + return -EHOSTUNREACH; + needs_conn = (sock->state != SS_READY); if (unlikely(needs_conn)) { if (sock->state == SS_LISTENING) { ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH]: tipc: Fix oops on send prior to entering networked mode 2010-02-19 19:40 [PATCH]: tipc: Fix oops on send prior to entering networked mode Neil Horman @ 2010-02-22 22:44 ` Stephens, Allan 2010-02-23 1:11 ` Neil Horman 2010-03-02 18:33 ` [PATCH]: tipc: Fix oops on send prior to entering networked mode (v2) Neil Horman 1 sibling, 1 reply; 28+ messages in thread From: Stephens, Allan @ 2010-02-22 22:44 UTC (permalink / raw) To: Neil Horman, netdev; +Cc: per.liden, jon.maloy, tipc-discussion, davem Hi Neil: Good work on spotting this bug, and in tracking down the cause. I took a look at your patch, but there are a couple of problems I can see with the approach you've taken to fix things: 1) The check you've added to send_msg() in net/tipc/socket.c will help prevent things from blowing up if the message sender is using an AF_MIPC socket from user space, but it won't help prevent a similar oops if an "application" uses TIPC's native API to send a message directly from kernel space. 2) The other change you made to defer setting tipc_mode to TIPC_NET_MODE will cause problems if TIPC has to bail out during tipc_net_start() because of problems. Specifically, the ensuing call to tipc_net_stop() won't get a chance to clean up any partial initialization that got done prior to the initialization problem, which could result in memory leaks. Fortunately, I think there's a relatively easy solution. Since TIPC always needs to call tipc_node_select() in order to send an off-node message, you should be able to add the necessary error checking there. I'm thinking of something along the lines of: static inline struct tipc_node *tipc_node_select(u32 addr, u32 selector) { if (likely(in_own_cluster(addr))) return tipc_local_nodes ? tipc_local_nodes[tipc_node(addr)] : NULL; return tipc_net->zones ? tipc_node_select_next_hop(addr, selector) : NULL; } Please give this a try and let us know how things go. Regards, Al > -----Original Message----- > From: Neil Horman [mailto:nhorman@tuxdriver.com] > Sent: Friday, February 19, 2010 2:41 PM > To: netdev@vger.kernel.org > Cc: per.liden@ericsson.com; jon.maloy@ericsson.com; Stephens, > Allan; tipc-discussion@lists.sourceforge.net; > davem@davemloft.net; nhorman@tuxdriver.com > Subject: [PATCH]: tipc: Fix oops on send prior to entering > networked mode > > Fix TIPC to disallow sending to remote addresses prior to > entering NET_MODE > > user programs can oops the kernel by sending datagrams via > AF_TIPC prior to entering networked mode. The following > backtrace has been observed: > > ID: 13459 TASK: ffff810014640040 CPU: 0 COMMAND: "tipc-client" > #0 [ffff81002d9a5810] crash_kexec at ffffffff800ac5b9 > #1 [ffff81002d9a58d0] __die at ffffffff80065127 > #2 [ffff81002d9a5910] do_page_fault at ffffffff80066da7 > #3 [ffff81002d9a5a00] error_exit at ffffffff8005dde9 > [exception RIP: tipc_node_select_next_hop+90] > RIP: ffffffff8869d3c3 RSP: ffff81002d9a5ab8 RFLAGS: 00010202 > RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000001 > RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000001001001 > RBP: 0000000001001001 R8: 0074736575716552 R9: 0000000000000000 > R10: ffff81003fbd0680 R11: 00000000000000c8 R12: 0000000000000008 > R13: 0000000000000001 R14: 0000000000000001 R15: ffff810015c6ca00 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > #4 [ffff81002d9a5ab0] tipc_node_select_next_hop at ffffffff8869d3b1 > #5 [ffff81002d9a5ae0] tipc_link_send_sections_fast at ffffffff88698558 > #6 [ffff81002d9a5be0] tipc_forward2port at ffffffff8869eb1d > #7 [ffff81002d9a5c10] tipc_send2port at ffffffff8869eb79 > #8 [ffff81002d9a5c30] send_msg at ffffffff886a1d0b > #9 [ffff81002d9a5cb0] sock_sendmsg at ffffffff80055261 > RIP: 0000003cbd8d49a3 RSP: 00007fffc84e0be8 RFLAGS: 00010206 > RAX: 000000000000002c RBX: ffffffff8005d116 RCX: 0000000000000000 > RDX: 0000000000000008 RSI: 00007fffc84e0c00 RDI: 0000000000000003 > RBP: 0000000000000000 R8: 00007fffc84e0c10 R9: 0000000000000010 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > R13: 00007fffc84e0d10 R14: 0000000000000000 R15: 00007fffc84e0c30 > ORIG_RAX: 000000000000002c CS: 0033 SS: 002b > > What happens is that, when the tipc module in inserted it > enters a standalone node mode in which communication to its > own address is allowed <0.0.0> but not to other addresses, > since the appropriate data structures have not been allocated > yet (specifically the tipc_net pointer). There is nothing > stopping a client from trying to send such a message however, > and if that happens, we attempt to dereference tipc_net.zones > while the pointer is still NULL, and explode. The fix is to > add a check at the start of the send_msg path to ensure that > we've allocated the tipc_net pointers and entered networked > mode prior to allowing a send to any destination other than > our loopback address. > > This patch has received minimal testing, but fixes the issue. > Through reviews are appreciated. > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com> > > CC: Per Liden <per.liden@ericsson.com> > CC: Jon Maloy <jon.maloy@ericsson.com> > CC: Allan Stephens <allan.stephens@windriver.com> > CC: David S. Miller <davem@davemloft.net> > CC: Neil Horman <nhorman@tuxdriver.com> > CC: tipc-discussion@lists.sourceforge.net > > > net.c | 2 +- > socket.c | 4 ++++ > 2 files changed, 5 insertions(+), 1 deletion(-) > > diff --git a/net/tipc/net.c b/net/tipc/net.c index > 7906608..512b33c 100644 > --- a/net/tipc/net.c > +++ b/net/tipc/net.c > @@ -278,7 +278,6 @@ int tipc_net_start(u32 addr) > tipc_cfg_stop(); > > tipc_own_addr = addr; > - tipc_mode = TIPC_NET_MODE; > tipc_named_reinit(); > tipc_port_reinit(); > > @@ -289,6 +288,7 @@ int tipc_net_start(u32 addr) > return res; > } > > + tipc_mode = TIPC_NET_MODE; > tipc_k_signal((Handler)tipc_subscr_start, 0); > tipc_k_signal((Handler)tipc_cfg_init, 0); > > diff --git a/net/tipc/socket.c b/net/tipc/socket.c index > 1ea64f0..45229fd 100644 > --- a/net/tipc/socket.c > +++ b/net/tipc/socket.c > @@ -526,6 +526,10 @@ static int send_msg(struct kiocb *iocb, > struct socket *sock, > if (iocb) > lock_sock(sk); > > + if ((tipc_mode != TIPC_NET_MODE) && > + (dest->addr.name.domain != tipc_own_addr)) > + return -EHOSTUNREACH; > + > needs_conn = (sock->state != SS_READY); > if (unlikely(needs_conn)) { > if (sock->state == SS_LISTENING) { > ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH]: tipc: Fix oops on send prior to entering networked mode 2010-02-22 22:44 ` Stephens, Allan @ 2010-02-23 1:11 ` Neil Horman 2010-02-23 15:02 ` Stephens, Allan 0 siblings, 1 reply; 28+ messages in thread From: Neil Horman @ 2010-02-23 1:11 UTC (permalink / raw) To: Stephens, Allan; +Cc: netdev, per.liden, jon.maloy, tipc-discussion, davem On Mon, Feb 22, 2010 at 02:44:48PM -0800, Stephens, Allan wrote: > Hi Neil: > > Good work on spotting this bug, and in tracking down the cause. I took > a look at your patch, but there are a couple of problems I can see with > the approach you've taken to fix things: > Thanks. Like I mentioned in my initial post, my approach only had minimal testing, and I'm not supprised that my approach had some rough edges. > 1) The check you've added to send_msg() in net/tipc/socket.c will help > prevent things from blowing up if the message sender is using an AF_MIPC > socket from user space, but it won't help prevent a similar oops if an > "application" uses TIPC's native API to send a message directly from > kernel space. > > 2) The other change you made to defer setting tipc_mode to TIPC_NET_MODE > will cause problems if TIPC has to bail out during tipc_net_start() > because of problems. Specifically, the ensuing call to tipc_net_stop() > won't get a chance to clean up any partial initialization that got done > prior to the initialization problem, which could result in memory leaks. > > Fortunately, I think there's a relatively easy solution. Since TIPC > always needs to call tipc_node_select() in order to send an off-node > message, you should be able to add the necessary error checking there. > I'm thinking of something along the lines of: > That seems like it might be a problem in and of itself. If the startup/shutdown code relies on the state of the implementation, perhaps that worth cleaning up so as to avoid a race condition. > static inline struct tipc_node *tipc_node_select(u32 addr, u32 selector) > { > if (likely(in_own_cluster(addr))) > return tipc_local_nodes ? > tipc_local_nodes[tipc_node(addr)] : NULL; > return tipc_net->zones ? tipc_node_select_next_hop(addr, > selector) : NULL; > } > > Please give this a try and let us know how things go. > I'm happy to give this a try, but I'm a bit concerned by this approach. It certainly seems like it will solve the problem at hand, but it doesn't seem to address the underlying cause, which is that you can execute code paths which the state of the implementation doesn't/shouldn't allow. In other words, this solve the immediate problem, but it still allows someone to try send data via tipc before tipc is ready to send data. It would be nice if we could deal with the larger problem. I'll let you know how the above patch goes. Regards Neil > Regards, > Al > > > -----Original Message----- > > From: Neil Horman [mailto:nhorman@tuxdriver.com] > > Sent: Friday, February 19, 2010 2:41 PM > > To: netdev@vger.kernel.org > > Cc: per.liden@ericsson.com; jon.maloy@ericsson.com; Stephens, > > Allan; tipc-discussion@lists.sourceforge.net; > > davem@davemloft.net; nhorman@tuxdriver.com > > Subject: [PATCH]: tipc: Fix oops on send prior to entering > > networked mode > > > > Fix TIPC to disallow sending to remote addresses prior to > > entering NET_MODE > > > > user programs can oops the kernel by sending datagrams via > > AF_TIPC prior to entering networked mode. The following > > backtrace has been observed: > > > > ID: 13459 TASK: ffff810014640040 CPU: 0 COMMAND: "tipc-client" > > #0 [ffff81002d9a5810] crash_kexec at ffffffff800ac5b9 > > #1 [ffff81002d9a58d0] __die at ffffffff80065127 > > #2 [ffff81002d9a5910] do_page_fault at ffffffff80066da7 > > #3 [ffff81002d9a5a00] error_exit at ffffffff8005dde9 > > [exception RIP: tipc_node_select_next_hop+90] > > RIP: ffffffff8869d3c3 RSP: ffff81002d9a5ab8 RFLAGS: 00010202 > > RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000001 > > RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000001001001 > > RBP: 0000000001001001 R8: 0074736575716552 R9: 0000000000000000 > > R10: ffff81003fbd0680 R11: 00000000000000c8 R12: 0000000000000008 > > R13: 0000000000000001 R14: 0000000000000001 R15: ffff810015c6ca00 > > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > > #4 [ffff81002d9a5ab0] tipc_node_select_next_hop at ffffffff8869d3b1 > > #5 [ffff81002d9a5ae0] tipc_link_send_sections_fast at ffffffff88698558 > > #6 [ffff81002d9a5be0] tipc_forward2port at ffffffff8869eb1d > > #7 [ffff81002d9a5c10] tipc_send2port at ffffffff8869eb79 > > #8 [ffff81002d9a5c30] send_msg at ffffffff886a1d0b > > #9 [ffff81002d9a5cb0] sock_sendmsg at ffffffff80055261 > > RIP: 0000003cbd8d49a3 RSP: 00007fffc84e0be8 RFLAGS: 00010206 > > RAX: 000000000000002c RBX: ffffffff8005d116 RCX: 0000000000000000 > > RDX: 0000000000000008 RSI: 00007fffc84e0c00 RDI: 0000000000000003 > > RBP: 0000000000000000 R8: 00007fffc84e0c10 R9: 0000000000000010 > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > > R13: 00007fffc84e0d10 R14: 0000000000000000 R15: 00007fffc84e0c30 > > ORIG_RAX: 000000000000002c CS: 0033 SS: 002b > > > > What happens is that, when the tipc module in inserted it > > enters a standalone node mode in which communication to its > > own address is allowed <0.0.0> but not to other addresses, > > since the appropriate data structures have not been allocated > > yet (specifically the tipc_net pointer). There is nothing > > stopping a client from trying to send such a message however, > > and if that happens, we attempt to dereference tipc_net.zones > > while the pointer is still NULL, and explode. The fix is to > > add a check at the start of the send_msg path to ensure that > > we've allocated the tipc_net pointers and entered networked > > mode prior to allowing a send to any destination other than > > our loopback address. > > > > This patch has received minimal testing, but fixes the issue. > > Through reviews are appreciated. > > > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com> > > > > CC: Per Liden <per.liden@ericsson.com> > > CC: Jon Maloy <jon.maloy@ericsson.com> > > CC: Allan Stephens <allan.stephens@windriver.com> > > CC: David S. Miller <davem@davemloft.net> > > CC: Neil Horman <nhorman@tuxdriver.com> > > CC: tipc-discussion@lists.sourceforge.net > > > > > > net.c | 2 +- > > socket.c | 4 ++++ > > 2 files changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/net/tipc/net.c b/net/tipc/net.c index > > 7906608..512b33c 100644 > > --- a/net/tipc/net.c > > +++ b/net/tipc/net.c > > @@ -278,7 +278,6 @@ int tipc_net_start(u32 addr) > > tipc_cfg_stop(); > > > > tipc_own_addr = addr; > > - tipc_mode = TIPC_NET_MODE; > > tipc_named_reinit(); > > tipc_port_reinit(); > > > > @@ -289,6 +288,7 @@ int tipc_net_start(u32 addr) > > return res; > > } > > > > + tipc_mode = TIPC_NET_MODE; > > tipc_k_signal((Handler)tipc_subscr_start, 0); > > tipc_k_signal((Handler)tipc_cfg_init, 0); > > > > diff --git a/net/tipc/socket.c b/net/tipc/socket.c index > > 1ea64f0..45229fd 100644 > > --- a/net/tipc/socket.c > > +++ b/net/tipc/socket.c > > @@ -526,6 +526,10 @@ static int send_msg(struct kiocb *iocb, > > struct socket *sock, > > if (iocb) > > lock_sock(sk); > > > > + if ((tipc_mode != TIPC_NET_MODE) && > > + (dest->addr.name.domain != tipc_own_addr)) > > + return -EHOSTUNREACH; > > + > > needs_conn = (sock->state != SS_READY); > > if (unlikely(needs_conn)) { > > if (sock->state == SS_LISTENING) { > > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH]: tipc: Fix oops on send prior to entering networked mode 2010-02-23 1:11 ` Neil Horman @ 2010-02-23 15:02 ` Stephens, Allan 2010-02-23 16:09 ` Neil Horman 0 siblings, 1 reply; 28+ messages in thread From: Stephens, Allan @ 2010-02-23 15:02 UTC (permalink / raw) To: Neil Horman; +Cc: jon.maloy, netdev, tipc-discussion, davem Neil wrote: > > 2) The other change you made to defer setting tipc_mode to > > TIPC_NET_MODE will cause problems if TIPC has to bail out during > > tipc_net_start() because of problems. Specifically, the > ensuing call > > to tipc_net_stop() won't get a chance to clean up any partial > > initialization that got done prior to the initialization > problem, which could result in memory leaks. > > > > Fortunately, I think there's a relatively easy solution. > Since TIPC > > always needs to call tipc_node_select() in order to send an > off-node > > message, you should be able to add the necessary error > checking there. > > I'm thinking of something along the lines of: > > > That seems like it might be a problem in and of itself. If > the startup/shutdown code relies on the state of the > implementation, perhaps that worth cleaning up so as to avoid > a race condition. I'm afraid I don't understand what you mean here. It sounds like you're saying that TIPC's code shouldn't rely on the state of its own implementation. > > static inline struct tipc_node *tipc_node_select(u32 addr, u32 > > selector) { > > if (likely(in_own_cluster(addr))) > > return tipc_local_nodes ? > > tipc_local_nodes[tipc_node(addr)] : NULL; > > return tipc_net->zones ? tipc_node_select_next_hop(addr, > > selector) : NULL; > > } > > > > Please give this a try and let us know how things go. > > > I'm happy to give this a try, but I'm a bit concerned by this > approach. It certainly seems like it will solve the problem > at hand, but it doesn't seem to address the underlying cause, > which is that you can execute code paths which the state of > the implementation doesn't/shouldn't allow. In other words, > this solve the immediate problem, but it still allows someone > to try send data via tipc before tipc is ready to send data. > It would be nice if we could deal with the larger problem. I don't think you've stated the issue correctly. The problem you encountered isn't that TIPC is allowing users to send data before TIPC is ready, it's that TIPC is allowing users to send data *off-node* before TIPC is ready. TIPC was deliberately designed so that messages could be sent within the node itself as soon as possible, without having to wait for full connectivity to the rest of the network, and this it does quite well. Later, once connectivity to the network is established, TIPC allows applications to send data to other nodes, and this also appears to work properly providing the applications use TIPC's location transparent service naming form of addressing. What we don't appear to have anticipated is that someone would attempt to send messages to another node without first receiving an indication from TIPC that the node could be reached. As far as I can tell, the only way that the crash you encountered could be generated would be if your application blindly tries to send to a named service on a specified node without first waiting for notification that the node exists. (Please correct me if I'm wrong about this.) While this operation is certainly legal, including a specific node address in a send operation kind of defeats the whole location transparent addressing premise on which TIPC is based and I'm wondering if it's really necessary to do things this way. Regardless, the fix I've suggested seems to me to be a reasonable way of blocking premature sends off-node while still permitting on-node sends to work, and should make everyone happy. > I'll let you know how the above patch goes. Thanks, Al ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH]: tipc: Fix oops on send prior to entering networked mode 2010-02-23 15:02 ` Stephens, Allan @ 2010-02-23 16:09 ` Neil Horman 2010-02-23 16:21 ` Stephens, Allan 0 siblings, 1 reply; 28+ messages in thread From: Neil Horman @ 2010-02-23 16:09 UTC (permalink / raw) To: Stephens, Allan; +Cc: netdev, jon.maloy, tipc-discussion, davem On Tue, Feb 23, 2010 at 07:02:16AM -0800, Stephens, Allan wrote: > Neil wrote: > > > > 2) The other change you made to defer setting tipc_mode to > > > TIPC_NET_MODE will cause problems if TIPC has to bail out during > > > tipc_net_start() because of problems. Specifically, the > > ensuing call > > > to tipc_net_stop() won't get a chance to clean up any partial > > > initialization that got done prior to the initialization > > problem, which could result in memory leaks. > > > > > > Fortunately, I think there's a relatively easy solution. > > Since TIPC > > > always needs to call tipc_node_select() in order to send an > > off-node > > > message, you should be able to add the necessary error > > checking there. > > > I'm thinking of something along the lines of: > > > > > That seems like it might be a problem in and of itself. If > > the startup/shutdown code relies on the state of the > > implementation, perhaps that worth cleaning up so as to avoid > > a race condition. > > I'm afraid I don't understand what you mean here. It sounds like you're > saying that TIPC's code shouldn't rely on the state of its own > implementation. > Not at all, I'm saying quite the opposite, that TIPC should rely on its own state, but in its current implementation: 1) It doesn't (i.e. theres no check in the send path for messages that the internal mode is TIPC_NET_MODE if the destination address is not for the local z.c.n tuple). 2) It couldn't rely on the internal state if it did check (i.e tipc_net_start sets tipc_mode to TIPC_NET_MODE prior to initalizing the data structures required to support sending messages off node). So een if we did do a check for being in NET mode prior to sending a message, it would be useless because theres a window of time where the implementation says its ok to send, but its really not (thats what caused the above oops). > > > > static inline struct tipc_node *tipc_node_select(u32 addr, u32 > > > selector) { > > > if (likely(in_own_cluster(addr))) > > > return tipc_local_nodes ? > > > tipc_local_nodes[tipc_node(addr)] : NULL; > > > return tipc_net->zones ? tipc_node_select_next_hop(addr, > > > selector) : NULL; > > > } > > > > > > Please give this a try and let us know how things go. > > > > > I'm happy to give this a try, but I'm a bit concerned by this > > approach. It certainly seems like it will solve the problem > > at hand, but it doesn't seem to address the underlying cause, > > which is that you can execute code paths which the state of > > the implementation doesn't/shouldn't allow. In other words, > > this solve the immediate problem, but it still allows someone > > to try send data via tipc before tipc is ready to send data. > > It would be nice if we could deal with the larger problem. > > I don't think you've stated the issue correctly. The problem you > encountered isn't that TIPC is allowing users to send data before TIPC > is ready, it's that TIPC is allowing users to send data *off-node* > before TIPC is ready. TIPC was deliberately designed so that messages Well, yes. Sorry for not being clear. We only need to check that we're in net mode if we're not sending to the local node. > could be sent within the node itself as soon as possible, without having > to wait for full connectivity to the rest of the network, and this it My initial patch checked for this: + if ((tipc_mode != TIPC_NET_MODE) && + (dest->addr.name.domain != tipc_own_addr)) + return -EHOSTUNREACH; + If we're not in NET mode but the destination is not tipc_own_addr (initialized to <0.0.0>) we can still send. > does quite well. Later, once connectivity to the network is > established, TIPC allows applications to send data to other nodes, and > this also appears to work properly providing the applications use TIPC's > location transparent service naming form of addressing. > Yes, it works fine, as long as user space applications all do the right things, > What we don't appear to have anticipated is that someone would attempt > to send messages to another node without first receiving an indication > from TIPC that the node could be reached. As far as I can tell, the > only way that the crash you encountered could be generated would be if > your application blindly tries to send to a named service on a specified > node without first waiting for notification that the node exists. > (Please correct me if I'm wrong about this.) While this operation is No, you got that exactly right :). > certainly legal, including a specific node address in a send operation > kind of defeats the whole location transparent addressing premise on > which TIPC is based and I'm wondering if it's really necessary to do > things this way. Regardless, the fix I've suggested seems to me to be a As you said, its perfectly legal, and one of the mandates of the kernel is to prevent unprivlidged user space applications from taking down the entire system when they do something stupid. I completely agree with you that user space is doing bad things here, but bad things need to result in errors and broken applications, not crashed systems. > reasonable way of blocking premature sends off-node while still > permitting on-node sends to work, and should make everyone happy. > I agree that you patch fixes the exact problem that I reported here, but theres more to it than that. A quick grep of the tipc stack reveals the following symbols: tipc_bearers media_list tipc_local_nodes bcbearer bclink tipc_net.zones All of these symbols: 1) Are allocated dynamically in tipc_net_start, _after_ tipc_mode is set to TIPC_NET_MODE 2) dereferenced without NULL pointer checks in either the send path or in the netlink configuration path, both of which are reachable from user space. So your patch fixes the last item on your list, but what about the others? In fact, I'll bet I can very quickly change the application to trip over a null tipc_local_nodes dereference by changing the destination address to be something within zone 0, cluster 0. I really think the proper solution needs to be modifying the send and control paths to gate on the internal state, and modify the init/shutdown code to change state properly. I'll let you know what I come up with Neil ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH]: tipc: Fix oops on send prior to entering networked mode 2010-02-23 16:09 ` Neil Horman @ 2010-02-23 16:21 ` Stephens, Allan 2010-02-24 18:53 ` Neil Horman 0 siblings, 1 reply; 28+ messages in thread From: Stephens, Allan @ 2010-02-23 16:21 UTC (permalink / raw) To: Neil Horman; +Cc: jon.maloy, netdev, tipc-discussion, davem Neil wrote: > I agree that you patch fixes the exact problem that I > reported here, but theres more to it than that. A quick grep > of the tipc stack reveals the following > symbols: > tipc_bearers > media_list > tipc_local_nodes > bcbearer > bclink > tipc_net.zones > > All of these symbols: > > 1) Are allocated dynamically in tipc_net_start, _after_ > tipc_mode is set to TIPC_NET_MODE > > 2) dereferenced without NULL pointer checks in either the > send path or in the netlink configuration path, both of which > are reachable from user space. > > So your patch fixes the last item on your list, but what > about the others? In fact, I'll bet I can very quickly > change the application to trip over a null tipc_local_nodes > dereference by changing the destination address to be > something within zone 0, cluster 0. The semantics of TIPC addressing don't allow a node address of the form <0.0.N> where N != 0, so this kind of a send ateempt should be caught and handled by TIPC. However, you've already found one missing error check, so it's certainly worth trying it out! Regards, Al ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH]: tipc: Fix oops on send prior to entering networked mode 2010-02-23 16:21 ` Stephens, Allan @ 2010-02-24 18:53 ` Neil Horman 2010-02-24 19:05 ` Stephens, Allan 0 siblings, 1 reply; 28+ messages in thread From: Neil Horman @ 2010-02-24 18:53 UTC (permalink / raw) To: Stephens, Allan; +Cc: netdev, jon.maloy, tipc-discussion, davem On Tue, Feb 23, 2010 at 08:21:25AM -0800, Stephens, Allan wrote: > Neil wrote: > > > I agree that you patch fixes the exact problem that I > > reported here, but theres more to it than that. A quick grep > > of the tipc stack reveals the following > > symbols: > > tipc_bearers > > media_list > > tipc_local_nodes > > bcbearer > > bclink > > tipc_net.zones > > > > All of these symbols: > > > > 1) Are allocated dynamically in tipc_net_start, _after_ > > tipc_mode is set to TIPC_NET_MODE > > > > 2) dereferenced without NULL pointer checks in either the > > send path or in the netlink configuration path, both of which > > are reachable from user space. > > > > So your patch fixes the last item on your list, but what > > about the others? In fact, I'll bet I can very quickly > > change the application to trip over a null tipc_local_nodes > > dereference by changing the destination address to be > > something within zone 0, cluster 0. > > The semantics of TIPC addressing don't allow a node address of the form > <0.0.N> where N != 0, so this kind of a send ateempt should be caught > and handled by TIPC. However, you've already found one missing error > check, so it's certainly worth trying it out! > So, I've tested out your patch, and it fixes the problem that was reported (no suprisingly, it was pretty clear that it would), it also managed to fix the access to tipc_local_nodes (I'd previously missed the chunk of the patch that added the ? to that access), so thats all great. Then I did this: ./tipc-config -lsr 1.1.10:eth3-1.1.17:eth2 And got this: BUG: unable to handle kernel NULL pointer dereference at 00000000000000c4 IP: [<ffffffffa030f6a8>] tipc_bearer_find_interface+0x1f/0x66 [tipc] PGD 11be1a067 PUD 11c721067 PMD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map CPU 3 Pid: 1284, comm: tipc-config Not tainted 2.6.33-rc8 #1 0YK962/PowerEdge SC1435 RIP: 0010:[<ffffffffa030f6a8>] [<ffffffffa030f6a8>] tipc_bearer_find_interface+0x1f/0x66 [tipc] RSP: 0018:ffff88011c7b3a08 EFLAGS: 00010246 RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffff88011c7b3900 RDX: 0000000000000000 RSI: ffff88011c7b39c0 RDI: ffff88011c7b3a3c RBP: ffff88011c7b3a28 R08: 0000000000000032 R09: 000000000000000a R10: ffffffffa03258b8 R11: 0000000000000000 R12: 0000000000000000 R13: ffff88011c7b3a3c R14: 0000000000000040 R15: ffffffff829a89b0 FS: 00007fd2d1c56700(0000) GS:ffff880082400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000c4 CR3: 000000011c7aa000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process tipc-config (pid: 1284, threadinfo ffff88011c7b2000, task ffff88011cba48a0) Stack: ffff880023022260 ffff88011c7b3a38 ffff880023022260 ffff88011c7b3aa0 <0> ffff88011c7b3a88 ffffffffa031294c 336874650100100a ffff880023022200 <0> 0100101100000040 ffffff0032687465 ffff88011c7b3a88 00000000c78c4377 Call Trace: [<ffffffffa031294c>] link_find_link+0x40/0x9d [tipc] [<ffffffffa0312fce>] ? tipc_link_cmd_reset_stats+0x63/0xbb [tipc] [<ffffffffa0312fda>] tipc_link_cmd_reset_stats+0x6f/0xbb [tipc] [<ffffffffa031093d>] tipc_cfg_do_cmd+0x2ae/0x7d4 [tipc] [<ffffffffa031a506>] handle_cmd+0x68/0xba [tipc] [<ffffffff813d493f>] genl_rcv_msg+0x1c7/0x1eb [<ffffffff813d4778>] ? genl_rcv_msg+0x0/0x1eb [<ffffffff813d3895>] netlink_rcv_skb+0x43/0x94 [<ffffffff813d4771>] genl_rcv+0x26/0x2d [<ffffffff813d3664>] netlink_unicast+0x125/0x18e [<ffffffff813d3e4e>] netlink_sendmsg+0x259/0x268 [<ffffffff813a57d4>] __sock_sendmsg+0x5e/0x69 [<ffffffff813a7f37>] sock_aio_write+0xc0/0xd4 [<ffffffff8107e57f>] ? print_lock_contention_bug+0x1b/0xe0 [<ffffffff81114f59>] do_sync_write+0xc4/0x101 [<ffffffff811e5cd9>] ? selinux_file_permission+0xa7/0xb3 [<ffffffff811dbdf1>] ? security_file_permission+0x16/0x18 [<ffffffff811154e8>] vfs_write+0xc1/0x10b [<ffffffff8107cb35>] ? trace_hardirqs_on_caller+0x1f/0x141 [<ffffffff811155f2>] sys_write+0x4a/0x6e [<ffffffff81009b82>] system_call_fastpath+0x16/0x1b Code: 5f 5b 41 5c 41 5d 41 5e 41 5f c9 c3 55 48 89 e5 41 55 41 54 53 48 83 ec 08 0f 1f 44 00 00 48 8b 1d 86 68 01 00 45 31 e4 49 89 fd <83> bb c4 00 00 00 00 74 1e 48 8d 7b 5c be 3a 00 00 00 e8 RIP [<ffffffffa030f6a8>] tipc_bearer_find_interface+0x1f/0x66 [tipc] RSP <ffff88011c7b3a08> CR2: 00000000000000c4 ---[ end trace a14d3b6105c45243 ]--- The use of that command was arbitrary, it was just one of the paths from user space to one of the variables that I mentioned previously. And my patch wouldn't have fixed this either, but its illustrative of my earlier assertion, that theres no real synchronization between the user-space accessable paths in tipc and the implementation state which should gate access to those paths. Now we could continue to add NULL checks whereever these bugs pop up (and perhaps in the above case specifically that might be valid, I'm not sure yet), but that just feels like a loosing battle that we might never quite catch up with, as the code evolves. Seems to me like a better solution would be adding common gating in the netlink and send paths to ensure that only when the system was properly configured could you send various messages into the stack. I'll try to put an omibus patch together shortly. Thanks! Neil ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [PATCH]: tipc: Fix oops on send prior to entering networked mode 2010-02-24 18:53 ` Neil Horman @ 2010-02-24 19:05 ` Stephens, Allan 2010-02-24 21:15 ` Neil Horman ` (2 more replies) 0 siblings, 3 replies; 28+ messages in thread From: Stephens, Allan @ 2010-02-24 19:05 UTC (permalink / raw) To: Neil Horman; +Cc: netdev, jon.maloy, tipc-discussion, davem Hi Neil: Have you tried upgrading your system to use TIPC 1.7.6 (available at http://tipc.sourceforge.net/tipc_linux.html)? This is a significant revised and enhanced version of TIPC that hasn't yet made its way into mainsteam Linux, and seems to be the version-of-choice for most TIPC users. It also appears to avoid a number of the issues that currently exist in TIPC 1.6, including the one caused by the random configuration command you mentioned in your email below. I didn't have a problem with you working on a small patch for TIPC 1.6 to get around a limited problem, but I'd hate to see you waste time on fixing issues that have already been addressed in TIPC 1.7. Regards, Al > -----Original Message----- > From: Neil Horman [mailto:nhorman@tuxdriver.com] > Sent: Wednesday, February 24, 2010 1:54 PM > To: Stephens, Allan > Cc: netdev@vger.kernel.org; jon.maloy@ericsson.com; > tipc-discussion@lists.sourceforge.net; davem@davemloft.net > Subject: Re: [PATCH]: tipc: Fix oops on send prior to > entering networked mode > > On Tue, Feb 23, 2010 at 08:21:25AM -0800, Stephens, Allan wrote: > > Neil wrote: > > > > > I agree that you patch fixes the exact problem that I > reported here, > > > but theres more to it than that. A quick grep of the tipc stack > > > reveals the following > > > symbols: > > > tipc_bearers > > > media_list > > > tipc_local_nodes > > > bcbearer > > > bclink > > > tipc_net.zones > > > > > > All of these symbols: > > > > > > 1) Are allocated dynamically in tipc_net_start, _after_ > tipc_mode is > > > set to TIPC_NET_MODE > > > > > > 2) dereferenced without NULL pointer checks in either the > send path > > > or in the netlink configuration path, both of which are reachable > > > from user space. > > > > > > So your patch fixes the last item on your list, but what > about the > > > others? In fact, I'll bet I can very quickly change the > application > > > to trip over a null tipc_local_nodes dereference by changing the > > > destination address to be something within zone 0, cluster 0. > > > > The semantics of TIPC addressing don't allow a node address of the > > form <0.0.N> where N != 0, so this kind of a send ateempt should be > > caught and handled by TIPC. However, you've already found > one missing > > error check, so it's certainly worth trying it out! > > > > So, I've tested out your patch, and it fixes the problem that > was reported (no suprisingly, it was pretty clear that it > would), it also managed to fix the access to tipc_local_nodes > (I'd previously missed the chunk of the patch that added the > ? to that access), so thats all great. Then I did this: > ./tipc-config -lsr 1.1.10:eth3-1.1.17:eth2 > > And got this: > > BUG: unable to handle kernel NULL pointer dereference at > 00000000000000c4 > IP: [<ffffffffa030f6a8>] tipc_bearer_find_interface+0x1f/0x66 > [tipc] PGD 11be1a067 PUD 11c721067 PMD 0 > Oops: 0000 [#1] SMP > last sysfs file: > /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map > CPU 3 > Pid: 1284, comm: tipc-config Not tainted 2.6.33-rc8 #1 > 0YK962/PowerEdge SC1435 > RIP: 0010:[<ffffffffa030f6a8>] [<ffffffffa030f6a8>] > tipc_bearer_find_interface+0x1f/0x66 [tipc] > RSP: 0018:ffff88011c7b3a08 EFLAGS: 00010246 > RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffff88011c7b3900 > RDX: 0000000000000000 RSI: ffff88011c7b39c0 RDI: ffff88011c7b3a3c > RBP: ffff88011c7b3a28 R08: 0000000000000032 R09: 000000000000000a > R10: ffffffffa03258b8 R11: 0000000000000000 R12: 0000000000000000 > R13: ffff88011c7b3a3c R14: 0000000000000040 R15: ffffffff829a89b0 > FS: 00007fd2d1c56700(0000) GS:ffff880082400000(0000) > knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00000000000000c4 CR3: 000000011c7aa000 CR4: 00000000000006e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 Process tipc-config (pid: 1284, threadinfo > ffff88011c7b2000, task > ffff88011cba48a0) > Stack: > ffff880023022260 ffff88011c7b3a38 ffff880023022260 > ffff88011c7b3aa0 <0> ffff88011c7b3a88 ffffffffa031294c > 336874650100100a ffff880023022200 <0> 0100101100000040 > ffffff0032687465 ffff88011c7b3a88 00000000c78c4377 Call Trace: > [<ffffffffa031294c>] link_find_link+0x40/0x9d [tipc] > [<ffffffffa0312fce>] ? tipc_link_cmd_reset_stats+0x63/0xbb > [tipc] [<ffffffffa0312fda>] > tipc_link_cmd_reset_stats+0x6f/0xbb [tipc] > [<ffffffffa031093d>] tipc_cfg_do_cmd+0x2ae/0x7d4 [tipc] > [<ffffffffa031a506>] handle_cmd+0x68/0xba [tipc] > [<ffffffff813d493f>] genl_rcv_msg+0x1c7/0x1eb > [<ffffffff813d4778>] ? genl_rcv_msg+0x0/0x1eb > [<ffffffff813d3895>] netlink_rcv_skb+0x43/0x94 > [<ffffffff813d4771>] genl_rcv+0x26/0x2d [<ffffffff813d3664>] > netlink_unicast+0x125/0x18e [<ffffffff813d3e4e>] > netlink_sendmsg+0x259/0x268 [<ffffffff813a57d4>] > __sock_sendmsg+0x5e/0x69 [<ffffffff813a7f37>] > sock_aio_write+0xc0/0xd4 [<ffffffff8107e57f>] ? > print_lock_contention_bug+0x1b/0xe0 > [<ffffffff81114f59>] do_sync_write+0xc4/0x101 > [<ffffffff811e5cd9>] ? selinux_file_permission+0xa7/0xb3 > [<ffffffff811dbdf1>] ? security_file_permission+0x16/0x18 > [<ffffffff811154e8>] vfs_write+0xc1/0x10b > [<ffffffff8107cb35>] ? trace_hardirqs_on_caller+0x1f/0x141 > [<ffffffff811155f2>] sys_write+0x4a/0x6e > [<ffffffff81009b82>] system_call_fastpath+0x16/0x1b > Code: 5f 5b 41 5c 41 5d 41 5e 41 5f c9 c3 55 48 89 e5 41 55 > 41 54 53 48 83 ec 08 0f 1f 44 00 00 48 8b 1d 86 68 01 00 45 > 31 e4 49 89 fd <83> bb c4 00 00 00 00 74 1e 48 8d 7b 5c be 3a > 00 00 00 e8 RIP [<ffffffffa030f6a8>] > tipc_bearer_find_interface+0x1f/0x66 [tipc] RSP <ffff88011c7b3a08> > CR2: 00000000000000c4 > ---[ end trace a14d3b6105c45243 ]--- > > The use of that command was arbitrary, it was just one of the > paths from user space to one of the variables that I > mentioned previously. And my patch wouldn't have fixed this > either, but its illustrative of my earlier assertion, that > theres no real synchronization between the user-space > accessable paths in tipc and the implementation state which > should gate access to those paths. > > Now we could continue to add NULL checks whereever these bugs > pop up (and perhaps in the above case specifically that might > be valid, I'm not sure yet), but that just feels like a > loosing battle that we might never quite catch up with, as > the code evolves. Seems to me like a better solution would > be adding common gating in the netlink and send paths to > ensure that only when the system was properly configured > could you send various messages into the stack. > > I'll try to put an omibus patch together shortly. > > Thanks! > Neil > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH]: tipc: Fix oops on send prior to entering networked mode 2010-02-24 19:05 ` Stephens, Allan @ 2010-02-24 21:15 ` Neil Horman 2010-02-25 1:38 ` David Miller 2010-02-24 21:19 ` Neil Horman 2010-02-25 1:34 ` David Miller 2 siblings, 1 reply; 28+ messages in thread From: Neil Horman @ 2010-02-24 21:15 UTC (permalink / raw) To: Stephens, Allan; +Cc: netdev, jon.maloy, tipc-discussion, davem On Wed, Feb 24, 2010 at 11:05:12AM -0800, Stephens, Allan wrote: > Hi Neil: > > Have you tried upgrading your system to use TIPC 1.7.6 (available at > http://tipc.sourceforge.net/tipc_linux.html)? This is a significant > revised and enhanced version of TIPC that hasn't yet made its way into > mainsteam Linux, and seems to be the version-of-choice for most TIPC > users. It also appears to avoid a number of the issues that currently > exist in TIPC 1.6, including the one caused by the random configuration > command you mentioned in your email below. > > I didn't have a problem with you working on a small patch for TIPC 1.6 > to get around a limited problem, but I'd hate to see you waste time on > fixing issues that have already been addressed in TIPC 1.7. > Yeah, yeah, it looks like 1.7.6 peppered the config paths with tipc_mode checks all over the place. Fine, can you post a patch here of the change that added that? Can you also post a version of the patch that you posted for the tipc_local_node patch and tipc_net that davem can apply (since that fixed the remaining problems). That just leaves the race condition on the mode setting (in which there is a time between the setting of tipc_mode and the completion of tipc_net_start during which you can pass all the mode checks without having all the data initalized. I'll send a patch for that shortly. As for this being a waste of time, its really not. Despite having a later version that developers might like to use, most distributions fork the upstream kernel directly, and assume fixes appear here. Even if developers never use the tipc module that ships with the upstream kernel, just having it built in case anyone wants to use it opens those distributions up to critical bugs, like this one, which allows unprivlidged local users to crash the system. And for those distributions, 'Just go get the latest source' really isn't a viable option in many/most cases. If its not fixed in the public kernel, then the code isn't very worthwhile to anyone. And for those distributions, 'Just go get the latest source' really isn't a viable option in many/most cases. Looking at the 1.7.6 tarball on sourceforge, its dated 10/10/2008, so you've basically got a 1.5 year lag between the development version and the commited version that distributions are using (and counting). I'm sorry, I'm not trying to be crass about this, but its rather disconcerting to see that kind of discrepancy between the code development point and whats in net-next. It implies that the only fix for a tipc problem is a wholesale upgrade of the tipc directory in the kernel (since it would seem the sourceforge tipc cvs tree has gone unused for a few years). Based on that it seems unwise for any distribution to default include tipc in its kernel. Would you agree? Anywho, given what you have in the tarball at the sourceforge site, a patch that adds all the requisite TIPC_NET_MODE checks as well as the previous patch to check tipc_net and tipc_local_node I think satisfies the bug at hand, if you would please post those with your sign-off, I'll ack them, and I'll post a patch for the remaining race shortly. Thanks & Regards Neil > Regards, > Al > > > -----Original Message----- > > From: Neil Horman [mailto:nhorman@tuxdriver.com] > > Sent: Wednesday, February 24, 2010 1:54 PM > > To: Stephens, Allan > > Cc: netdev@vger.kernel.org; jon.maloy@ericsson.com; > > tipc-discussion@lists.sourceforge.net; davem@davemloft.net > > Subject: Re: [PATCH]: tipc: Fix oops on send prior to > > entering networked mode > > > > On Tue, Feb 23, 2010 at 08:21:25AM -0800, Stephens, Allan wrote: > > > Neil wrote: > > > > > > > I agree that you patch fixes the exact problem that I > > reported here, > > > > but theres more to it than that. A quick grep of the tipc stack > > > > reveals the following > > > > symbols: > > > > tipc_bearers > > > > media_list > > > > tipc_local_nodes > > > > bcbearer > > > > bclink > > > > tipc_net.zones > > > > > > > > All of these symbols: > > > > > > > > 1) Are allocated dynamically in tipc_net_start, _after_ > > tipc_mode is > > > > set to TIPC_NET_MODE > > > > > > > > 2) dereferenced without NULL pointer checks in either the > > send path > > > > or in the netlink configuration path, both of which are reachable > > > > from user space. > > > > > > > > So your patch fixes the last item on your list, but what > > about the > > > > others? In fact, I'll bet I can very quickly change the > > application > > > > to trip over a null tipc_local_nodes dereference by changing the > > > > destination address to be something within zone 0, cluster 0. > > > > > > The semantics of TIPC addressing don't allow a node address of the > > > form <0.0.N> where N != 0, so this kind of a send ateempt should be > > > caught and handled by TIPC. However, you've already found > > one missing > > > error check, so it's certainly worth trying it out! > > > > > > > So, I've tested out your patch, and it fixes the problem that > > was reported (no suprisingly, it was pretty clear that it > > would), it also managed to fix the access to tipc_local_nodes > > (I'd previously missed the chunk of the patch that added the > > ? to that access), so thats all great. Then I did this: > > ./tipc-config -lsr 1.1.10:eth3-1.1.17:eth2 > > > > And got this: > > > > BUG: unable to handle kernel NULL pointer dereference at > > 00000000000000c4 > > IP: [<ffffffffa030f6a8>] tipc_bearer_find_interface+0x1f/0x66 > > [tipc] PGD 11be1a067 PUD 11c721067 PMD 0 > > Oops: 0000 [#1] SMP > > last sysfs file: > > /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map > > CPU 3 > > Pid: 1284, comm: tipc-config Not tainted 2.6.33-rc8 #1 > > 0YK962/PowerEdge SC1435 > > RIP: 0010:[<ffffffffa030f6a8>] [<ffffffffa030f6a8>] > > tipc_bearer_find_interface+0x1f/0x66 [tipc] > > RSP: 0018:ffff88011c7b3a08 EFLAGS: 00010246 > > RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffff88011c7b3900 > > RDX: 0000000000000000 RSI: ffff88011c7b39c0 RDI: ffff88011c7b3a3c > > RBP: ffff88011c7b3a28 R08: 0000000000000032 R09: 000000000000000a > > R10: ffffffffa03258b8 R11: 0000000000000000 R12: 0000000000000000 > > R13: ffff88011c7b3a3c R14: 0000000000000040 R15: ffffffff829a89b0 > > FS: 00007fd2d1c56700(0000) GS:ffff880082400000(0000) > > knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 00000000000000c4 CR3: 000000011c7aa000 CR4: 00000000000006e0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > > 0000000000000400 Process tipc-config (pid: 1284, threadinfo > > ffff88011c7b2000, task > > ffff88011cba48a0) > > Stack: > > ffff880023022260 ffff88011c7b3a38 ffff880023022260 > > ffff88011c7b3aa0 <0> ffff88011c7b3a88 ffffffffa031294c > > 336874650100100a ffff880023022200 <0> 0100101100000040 > > ffffff0032687465 ffff88011c7b3a88 00000000c78c4377 Call Trace: > > [<ffffffffa031294c>] link_find_link+0x40/0x9d [tipc] > > [<ffffffffa0312fce>] ? tipc_link_cmd_reset_stats+0x63/0xbb > > [tipc] [<ffffffffa0312fda>] > > tipc_link_cmd_reset_stats+0x6f/0xbb [tipc] > > [<ffffffffa031093d>] tipc_cfg_do_cmd+0x2ae/0x7d4 [tipc] > > [<ffffffffa031a506>] handle_cmd+0x68/0xba [tipc] > > [<ffffffff813d493f>] genl_rcv_msg+0x1c7/0x1eb > > [<ffffffff813d4778>] ? genl_rcv_msg+0x0/0x1eb > > [<ffffffff813d3895>] netlink_rcv_skb+0x43/0x94 > > [<ffffffff813d4771>] genl_rcv+0x26/0x2d [<ffffffff813d3664>] > > netlink_unicast+0x125/0x18e [<ffffffff813d3e4e>] > > netlink_sendmsg+0x259/0x268 [<ffffffff813a57d4>] > > __sock_sendmsg+0x5e/0x69 [<ffffffff813a7f37>] > > sock_aio_write+0xc0/0xd4 [<ffffffff8107e57f>] ? > > print_lock_contention_bug+0x1b/0xe0 > > [<ffffffff81114f59>] do_sync_write+0xc4/0x101 > > [<ffffffff811e5cd9>] ? selinux_file_permission+0xa7/0xb3 > > [<ffffffff811dbdf1>] ? security_file_permission+0x16/0x18 > > [<ffffffff811154e8>] vfs_write+0xc1/0x10b > > [<ffffffff8107cb35>] ? trace_hardirqs_on_caller+0x1f/0x141 > > [<ffffffff811155f2>] sys_write+0x4a/0x6e > > [<ffffffff81009b82>] system_call_fastpath+0x16/0x1b > > Code: 5f 5b 41 5c 41 5d 41 5e 41 5f c9 c3 55 48 89 e5 41 55 > > 41 54 53 48 83 ec 08 0f 1f 44 00 00 48 8b 1d 86 68 01 00 45 > > 31 e4 49 89 fd <83> bb c4 00 00 00 00 74 1e 48 8d 7b 5c be 3a > > 00 00 00 e8 RIP [<ffffffffa030f6a8>] > > tipc_bearer_find_interface+0x1f/0x66 [tipc] RSP <ffff88011c7b3a08> > > CR2: 00000000000000c4 > > ---[ end trace a14d3b6105c45243 ]--- > > > > The use of that command was arbitrary, it was just one of the > > paths from user space to one of the variables that I > > mentioned previously. And my patch wouldn't have fixed this > > either, but its illustrative of my earlier assertion, that > > theres no real synchronization between the user-space > > accessable paths in tipc and the implementation state which > > should gate access to those paths. > > > > Now we could continue to add NULL checks whereever these bugs > > pop up (and perhaps in the above case specifically that might > > be valid, I'm not sure yet), but that just feels like a > > loosing battle that we might never quite catch up with, as > > the code evolves. Seems to me like a better solution would > > be adding common gating in the netlink and send paths to > > ensure that only when the system was properly configured > > could you send various messages into the stack. > > > > I'll try to put an omibus patch together shortly. > > > > Thanks! > > Neil > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH]: tipc: Fix oops on send prior to entering networked mode 2010-02-24 21:15 ` Neil Horman @ 2010-02-25 1:38 ` David Miller 2010-02-25 14:24 ` Stephens, Allan 0 siblings, 1 reply; 28+ messages in thread From: David Miller @ 2010-02-25 1:38 UTC (permalink / raw) To: nhorman; +Cc: allan.stephens, netdev, jon.maloy, tipc-discussion From: Neil Horman <nhorman@tuxdriver.com> Date: Wed, 24 Feb 2010 16:15:07 -0500 > Looking at the 1.7.6 tarball on sourceforge, its dated 10/10/2008, > so you've basically got a 1.5 year lag between the development > version and the commited version that distributions are using (and > counting). I'm sorry, I'm not trying to be crass about this, but > its rather disconcerting to see that kind of discrepancy between the > code development point and whats in net-next. It implies that the > only fix for a tipc problem is a wholesale upgrade of the tipc > directory in the kernel (since it would seem the sourceforge tipc > cvs tree has gone unused for a few years). I'm extremely upset about this. Especially given the fact that EVERY DAMN TIME there were TIPC patches posted I applied EVERY DAMN ONE of them, and I did so in an expediant manner. So there is zero reason for the TIPC protocol development to not have occured upstream. I, in fact, facilitated things to work that way in the smoothest way possible, if only the TIPC developers had obliged. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH]: tipc: Fix oops on send prior to entering networked mode 2010-02-25 1:38 ` David Miller @ 2010-02-25 14:24 ` Stephens, Allan 2010-02-25 15:06 ` David Miller ` (2 more replies) 0 siblings, 3 replies; 28+ messages in thread From: Stephens, Allan @ 2010-02-25 14:24 UTC (permalink / raw) To: David Miller, nhorman; +Cc: jon.maloy, netdev, tipc-discussion > From: David Miller [mailto:davem@davemloft.net] > Sent: Wednesday, February 24, 2010 8:38 PM > > Looking at the 1.7.6 tarball on sourceforge, its dated > 10/10/2008, so > > you've basically got a 1.5 year lag between the development version > > and the commited version that distributions are using (and > counting). > > I'm sorry, I'm not trying to be crass about this, but its rather > > disconcerting to see that kind of discrepancy between the code > > development point and whats in net-next. It implies that > the only fix > > for a tipc problem is a wholesale upgrade of the tipc > directory in the > > kernel (since it would seem the sourceforge tipc cvs tree has gone > > unused for a few years). > > I'm extremely upset about this. > > Especially given the fact that EVERY DAMN TIME there were > TIPC patches posted I applied EVERY DAMN ONE of them, and I > did so in an expediant manner. > > So there is zero reason for the TIPC protocol development to > not have occured upstream. I, in fact, facilitated things to > work that way in the smoothest way possible, if only the TIPC > developers had obliged. I think it is important to understand how things got to be the way they are with TIPC before any finger pointing starts. When TIPC 1.6 was first integrated into Linux (back in 2.6.16) there was some considerable flak generated because it was done as a bulk patch, rather than being trickled in as a series of smaller, logically distinct patches. Since then, I have carefully ensured that all subsequent patches were submitted in the conventional Linux manner to avoid giving anyone cause for complaint. (And there can be no argument about the fact that David has been very good about applying the patches as soon as they were made available.) Later, the company that employs me found it necessary to enhance TIPC with new capabilities which would be included in a couple of our operating systems (one of which is Linux-based). To meet our schedule, it was necessary to make a large number of major changes to TIPC, and it was felt that submitting these relatively untested changes to mainstream Linux would be potentially destabilzing and therefore undesirable. Thus, development was done downstream under the "TIPC 1.7" banner, with the intent that the changes would be pushed back upstream once we knew they wouldn't negatively impact existing functionality. And so it went at first -- about 65 patches from TIPC 1.7 were in fact incorporated into Linux in 2008. However, because TIPC 1.7 was not developed using git (remember that Linux was only one of the operating systems involved), there was no existing set of small, logically distinct patches that could be applied to mainstream Linux, and each of the patches that was submitted had to be carefully drafted from the finished code base in a time-consuming manual process. Then came an economic downturn, and our company was forced to divert development resources away from TIPC and the remainder of the TIPC 1.7 code remained downstream-only (although freely available, and supported, on SourceForge). And while the status quo is admittedly less-than-optimal, it has apparently served the needs of the TIPC user community for the last couple of years and shouldn't give anyone cause to get upset. So much for past history. I'm more interested in understanding how we can best move forward on completing the job of integrating TIPC 1.7 into Linux. As far as I can tell there seem to be two routes forward. One is to continue following the previous method of extracting small patches from TIPC 1.7 and submitting them one by one; the other is to integrate the remaining code using a small number of bulk patches (i.e. the kind of ones that people complained about when TIPC 1.6 was integrated). The problem with insisting on the first approach is that it is simply too time consuming. So, would it be acceptable to David (and everyone else) if we took the latter approach as a one-time-only exception to the normal process? Or is there a third option available that could be done to integrate the rest of TIPC 1.7 quickly? Regards, Al PS. I'm holding back on replying to other previous emails from David and Neil in this thread since deciding how to fix the oops that triggered this whole discussion depends on whether or not we're going to integrate the rest of TIPC 1.7 or not. ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH]: tipc: Fix oops on send prior to entering networked mode 2010-02-25 14:24 ` Stephens, Allan @ 2010-02-25 15:06 ` David Miller 2010-02-25 16:24 ` Stephens, Allan 2010-02-25 15:13 ` David Miller 2010-02-25 15:23 ` Neil Horman 2 siblings, 1 reply; 28+ messages in thread From: David Miller @ 2010-02-25 15:06 UTC (permalink / raw) To: allan.stephens; +Cc: nhorman, netdev, jon.maloy, tipc-discussion, hadi From: "Stephens, Allan" <allan.stephens@windriver.com> Date: Thu, 25 Feb 2010 06:24:15 -0800 > PS. I'm holding back on replying to other previous emails from David and > Neil in this thread since deciding how to fix the oops that triggered > this whole discussion depends on whether or not we're going to integrate > the rest of TIPC 1.7 or not. Are you out of your mind? We're fixing the OOPS in the smallest and most expediant way possible. Integrating all of TIPC 1.7 to fix this OOPS is utter and pure insanity. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH]: tipc: Fix oops on send prior to entering networked mode 2010-02-25 15:06 ` David Miller @ 2010-02-25 16:24 ` Stephens, Allan 0 siblings, 0 replies; 28+ messages in thread From: Stephens, Allan @ 2010-02-25 16:24 UTC (permalink / raw) To: David Miller; +Cc: jon.maloy, netdev, hadi, tipc-discussion, nhorman > From: David Miller [mailto:davem@davemloft.net] > Sent: Thursday, February 25, 2010 10:06 AM > > PS. I'm holding back on replying to other previous emails > from David > > and Neil in this thread since deciding how to fix the oops that > > triggered this whole discussion depends on whether or not > we're going > > to integrate the rest of TIPC 1.7 or not. > > Are you out of your mind? > > We're fixing the OOPS in the smallest and most expediant way possible. > > Integrating all of TIPC 1.7 to fix this OOPS is utter and > pure insanity. That's what I thought you'd say, but I still thought it was worth asking the question. I've got no problem with limiting the changes to the matter currently at hand. -- Al ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH]: tipc: Fix oops on send prior to entering networked mode 2010-02-25 14:24 ` Stephens, Allan 2010-02-25 15:06 ` David Miller @ 2010-02-25 15:13 ` David Miller 2010-02-25 15:23 ` Neil Horman 2 siblings, 0 replies; 28+ messages in thread From: David Miller @ 2010-02-25 15:13 UTC (permalink / raw) To: allan.stephens; +Cc: nhorman, netdev, jon.maloy, tipc-discussion, hady From: "Stephens, Allan" <allan.stephens@windriver.com> Date: Thu, 25 Feb 2010 06:24:15 -0800 > Later, the company that employs me found it necessary to enhance TIPC > with new capabilities which would be included in a couple of our > operating systems (one of which is Linux-based). To meet our schedule, > it was necessary to make a large number of major changes to TIPC, and it > was felt that submitting these relatively untested changes to mainstream > Linux would be potentially destabilzing and therefore undesirable. Things were going just fine until Ericsson farmed the TIPC stuff out to you guys, really. In fact, an incredible amount of effort was made by the Ericsson folks to get the TIPC stack upstream in the first place. And now you guys made all of that basically for naught by taking your work downstream. A healthy upstream project turned into a "business decision." Thanks! The fact is, you would have had LESS work to do if you have integrated your work upstream as a rule. Us upstream folks would have been handling any and all networking API changes transparently for you. And people who run automated tools to validate code and look for bugs would have been fixing bugs in TIPC for you. The list goes on an on. But it doesn't make any "business sense" for you to work on your code upstream so you didn't do it. And now you want to suggest that we dump huge unreviewable chunks of code into the tree, again because it's less work for _YOU_. Thanks! ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH]: tipc: Fix oops on send prior to entering networked mode 2010-02-25 14:24 ` Stephens, Allan 2010-02-25 15:06 ` David Miller 2010-02-25 15:13 ` David Miller @ 2010-02-25 15:23 ` Neil Horman 2010-02-25 20:33 ` Stephens, Allan 2 siblings, 1 reply; 28+ messages in thread From: Neil Horman @ 2010-02-25 15:23 UTC (permalink / raw) To: Stephens, Allan; +Cc: David Miller, netdev, jon.maloy, tipc-discussion On Thu, Feb 25, 2010 at 06:24:15AM -0800, Stephens, Allan wrote: > > From: David Miller [mailto:davem@davemloft.net] > > Sent: Wednesday, February 24, 2010 8:38 PM > > > > Looking at the 1.7.6 tarball on sourceforge, its dated > > 10/10/2008, so > > > you've basically got a 1.5 year lag between the development version > > > and the commited version that distributions are using (and > > counting). > > > I'm sorry, I'm not trying to be crass about this, but its rather > > > disconcerting to see that kind of discrepancy between the code > > > development point and whats in net-next. It implies that > > the only fix > > > for a tipc problem is a wholesale upgrade of the tipc > > directory in the > > > kernel (since it would seem the sourceforge tipc cvs tree has gone > > > unused for a few years). > > > > I'm extremely upset about this. > > > > Especially given the fact that EVERY DAMN TIME there were > > TIPC patches posted I applied EVERY DAMN ONE of them, and I > > did so in an expediant manner. > > > > So there is zero reason for the TIPC protocol development to > > not have occured upstream. I, in fact, facilitated things to > > work that way in the smoothest way possible, if only the TIPC > > developers had obliged. > > I think it is important to understand how things got to be the way they > are with TIPC before any finger pointing starts. > > When TIPC 1.6 was first integrated into Linux (back in 2.6.16) there was > some considerable flak generated because it was done as a bulk patch, > rather than being trickled in as a series of smaller, logically distinct > patches. Since then, I have carefully ensured that all subsequent > patches were submitted in the conventional Linux manner to avoid giving > anyone cause for complaint. (And there can be no argument about the > fact that David has been very good about applying the patches as soon as > they were made available.) > > Later, the company that employs me found it necessary to enhance TIPC > with new capabilities which would be included in a couple of our > operating systems (one of which is Linux-based). To meet our schedule, > it was necessary to make a large number of major changes to TIPC, and it > was felt that submitting these relatively untested changes to mainstream > Linux would be potentially destabilzing and therefore undesirable. > Thus, development was done downstream under the "TIPC 1.7" banner, with > the intent that the changes would be pushed back upstream once we knew > they wouldn't negatively impact existing functionality. And so it went > at first -- about 65 patches from TIPC 1.7 were in fact incorporated > into Linux in 2008. However, because TIPC 1.7 was not developed using > git (remember that Linux was only one of the operating systems > involved), there was no existing set of small, logically distinct > patches that could be applied to mainstream Linux, and each of the > patches that was submitted had to be carefully drafted from the finished > code base in a time-consuming manual process. > > Then came an economic downturn, and our company was forced to divert > development resources away from TIPC and the remainder of the TIPC 1.7 > code remained downstream-only (although freely available, and supported, > on SourceForge). And while the status quo is admittedly > less-than-optimal, it has apparently served the needs of the TIPC user > community for the last couple of years and shouldn't give anyone cause > to get upset. > > So much for past history. I'm more interested in understanding how we > can best move forward on completing the job of integrating TIPC 1.7 into > Linux. As far as I can tell there seem to be two routes forward. One > is to continue following the previous method of extracting small patches > from TIPC 1.7 and submitting them one by one; the other is to integrate > the remaining code using a small number of bulk patches (i.e. the kind > of ones that people complained about when TIPC 1.6 was integrated). The > problem with insisting on the first approach is that it is simply too > time consuming. So, would it be acceptable to David (and everyone else) > if we took the latter approach as a one-time-only exception to the > normal process? Or is there a third option available that could be done > to integrate the rest of TIPC 1.7 quickly? > > Regards, > Al > > PS. I'm holding back on replying to other previous emails from David and > Neil in this thread since deciding how to fix the oops that triggered > this whole discussion depends on whether or not we're going to integrate > the rest of TIPC 1.7 or not. > > > > While I sympathize with your history, this really stick out of your explination for me: """ >To meet our schedule, > it was necessary to make a large number of major changes to TIPC, and it > was felt that submitting these relatively untested changes to mainstream > Linux would be potentially destabilzing and therefore undesirable. > Thus, development was done downstream under the "TIPC 1.7" banner, with > the intent that the changes would be pushed back upstream once we knew > they wouldn't negatively impact existing functionality. """ To me I see that and read it as: "We didn't have time to do things the way the Linux maintainers like to have them done, so we went off and did it on our own, figuring we'd get back to doing it the right way later" Well, flash forward 1.5 years, and getting back to it never happened. While I understand that can happen, it puts users in the lurch if they expect upstream to be the latest bits. As such, have you considered just dropping TIPC from upstream entirely? I know that sounds a bit angry, I don't intend it to, I mean it in all seriousness. Companies have constraints that sometimes conflict with upstream practices though, thats just a fact of life, and theres nothing we can really do about that. But if the review/accptance criteria of upstream development doesn't fit into a companies budget or schedule, not doing kernel community development might be the best solution. Doing so tells the distros that they have extra work to do if they want to incorporate/support tipc in their kernels, and frees your company to develop on its own schedule, and according to its own resources, while still being accessable to end users. Of course, in so doing you don't get the maintenence changes that come with an ever evolving ABI/etc, but thats the price you would have to pay, and a decision you could make. I'm not trying to tell you its the best solution, just an alternate option. Honestly, what I'd like to see is all the remaining changes in TIPC 1.7 go in as individual commits, so that we have a good change history, and a resonable review on all the changes. I know that takes longer but I think its the right solution. I'll even volunteer to help, if that lightens your load any. If you can provide me access to whatever scm you used (or even some modicum of changelog in a text file, so that we could try to match up changes with documentation), I'll happily start pulling stuff apart to get it upstream. But if thats a one shot deal, and you believe that the next schedule crunch you have will result in your company making another out-of-tree update, then perhaps switching to that mode of out-of-tree development might be worth consideration. Regards Neil ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [PATCH]: tipc: Fix oops on send prior to entering networked mode 2010-02-25 15:23 ` Neil Horman @ 2010-02-25 20:33 ` Stephens, Allan 2010-02-25 21:14 ` Neil Horman 0 siblings, 1 reply; 28+ messages in thread From: Stephens, Allan @ 2010-02-25 20:33 UTC (permalink / raw) To: Neil Horman, David Miller; +Cc: netdev, jon.maloy, tipc-discussion > From: Neil Horman [mailto:nhorman@tuxdriver.com] > Sent: Thursday, February 25, 2010 10:23 AM [text deleted] > To me I see that and read it as: > "We didn't have time to do things the way the Linux > maintainers like to have them done, so we went off and did it > on our own, figuring we'd get back to doing it the right way later" > > Well, flash forward 1.5 years, and getting back to it never > happened. While I understand that can happen, it puts users > in the lurch if they expect upstream to be the latest bits. > As such, have you considered just dropping TIPC from upstream > entirely? I know that sounds a bit angry, I don't intend it > to, I mean it in all seriousness. Companies have constraints > that sometimes conflict with upstream practices though, thats > just a fact of life, and theres nothing we can really do > about that. But if the review/accptance criteria of upstream > development doesn't fit into a companies budget or schedule, > not doing kernel community development might be the best > solution. Doing so tells the distros that they have extra > work to do if they want to incorporate/support tipc in their > kernels, and frees your company to develop on its own > schedule, and according to its own resources, while still > being accessable to end users. Of course, in so doing you > don't get the maintenence changes that come with an ever > evolving ABI/etc, but thats the price you would have to pay, > and a decision you could make. > > I'm not trying to tell you its the best solution, just an > alternate option. > Honestly, what I'd like to see is all the remaining changes > in TIPC 1.7 go in as individual commits, so that we have a > good change history, and a resonable review on all the > changes. I know that takes longer but I think its the right > solution. I'll even volunteer to help, if that lightens your > load any. If you can provide me access to whatever scm you > used (or even some modicum of changelog in a text file, so > that we could try to match up changes with documentation), > I'll happily start pulling stuff apart to get it upstream. I can't argue with the basic thrust of your comments, Neil, nor with the similar comments David has made in his previous emails. The current situation is certainly regrettable, and I think we're all in agreement that the best solution is to get the remainder of TIPC 1.7 into Linux as soon as possible (as individual patches, not a single mega-patch), then to limit any future changes to the upstream side of things. I've got no problem with Neil's offer to assist in getting the necessary patches generated, as this will undoubtedly speed things along faster than anything I could do on my own. (I suggest that we get started on this once we get the current oops situation resolved.) This kind of offer to share the burden is to be commended, and is an example that other TIPC users should look to. > But if thats a one shot deal, and you believe that the next > schedule crunch you have will result in your company making > another out-of-tree update, then perhaps switching to that > mode of out-of-tree development might be worth consideration. > > Regards > Neil Actually, we've got no plans to continue doing out-of-tree development on TIPC -- we've already suffered enough pain by following that route, and I certainly don't want to repeat the experience! The upside of having done things out-of-tree in the past is that it gives us hard data we can use to squelch discussion the next time somebody suggests doing it again. Regards, Al ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH]: tipc: Fix oops on send prior to entering networked mode 2010-02-25 20:33 ` Stephens, Allan @ 2010-02-25 21:14 ` Neil Horman 0 siblings, 0 replies; 28+ messages in thread From: Neil Horman @ 2010-02-25 21:14 UTC (permalink / raw) To: Stephens, Allan; +Cc: David Miller, netdev, jon.maloy, tipc-discussion On Thu, Feb 25, 2010 at 12:33:37PM -0800, Stephens, Allan wrote: > > From: Neil Horman [mailto:nhorman@tuxdriver.com] > > Sent: Thursday, February 25, 2010 10:23 AM > > [text deleted] > > > To me I see that and read it as: > > "We didn't have time to do things the way the Linux > > maintainers like to have them done, so we went off and did it > > on our own, figuring we'd get back to doing it the right way later" > > > > Well, flash forward 1.5 years, and getting back to it never > > happened. While I understand that can happen, it puts users > > in the lurch if they expect upstream to be the latest bits. > > As such, have you considered just dropping TIPC from upstream > > entirely? I know that sounds a bit angry, I don't intend it > > to, I mean it in all seriousness. Companies have constraints > > that sometimes conflict with upstream practices though, thats > > just a fact of life, and theres nothing we can really do > > about that. But if the review/accptance criteria of upstream > > development doesn't fit into a companies budget or schedule, > > not doing kernel community development might be the best > > solution. Doing so tells the distros that they have extra > > work to do if they want to incorporate/support tipc in their > > kernels, and frees your company to develop on its own > > schedule, and according to its own resources, while still > > being accessable to end users. Of course, in so doing you > > don't get the maintenence changes that come with an ever > > evolving ABI/etc, but thats the price you would have to pay, > > and a decision you could make. > > > > I'm not trying to tell you its the best solution, just an > > alternate option. > > Honestly, what I'd like to see is all the remaining changes > > in TIPC 1.7 go in as individual commits, so that we have a > > good change history, and a resonable review on all the > > changes. I know that takes longer but I think its the right > > solution. I'll even volunteer to help, if that lightens your > > load any. If you can provide me access to whatever scm you > > used (or even some modicum of changelog in a text file, so > > that we could try to match up changes with documentation), > > I'll happily start pulling stuff apart to get it upstream. > > I can't argue with the basic thrust of your comments, Neil, nor with the > similar comments David has made in his previous emails. The current > situation is certainly regrettable, and I think we're all in agreement > that the best solution is to get the remainder of TIPC 1.7 into Linux as > soon as possible (as individual patches, not a single mega-patch), then > to limit any future changes to the upstream side of things. > > I've got no problem with Neil's offer to assist in getting the necessary > patches generated, as this will undoubtedly speed things along faster > than anything I could do on my own. (I suggest that we get started on > this once we get the current oops situation resolved.) This kind of > offer to share the burden is to be commended, and is an example that > other TIPC users should look to. > Well, I'm glad to help, let me know what I can do, and what data you can put in my hands to get it done. FWIW, I'm looking over the missing TIPC_NET_MODE checks in the tarball, and I think they're small enough that we can roll them in with the other changes to the send path pretty easily. I've also run accross a locking issue (just theoretical, nothing I've demonstrated to myself yet). But it appears to exist in the 1.7 tarball as well as upstream. I'll roll it all up and post a omnibus patch to fix the opposes in the next day or two. Neil > > > But if thats a one shot deal, and you believe that the next > > schedule crunch you have will result in your company making > > another out-of-tree update, then perhaps switching to that > > mode of out-of-tree development might be worth consideration. > > > > Regards > > Neil > > Actually, we've got no plans to continue doing out-of-tree development > on TIPC -- we've already suffered enough pain by following that route, > and I certainly don't want to repeat the experience! The upside of > having done things out-of-tree in the past is that it gives us hard data > we can use to squelch discussion the next time somebody suggests doing > it again. > > Regards, > Al > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH]: tipc: Fix oops on send prior to entering networked mode 2010-02-24 19:05 ` Stephens, Allan 2010-02-24 21:15 ` Neil Horman @ 2010-02-24 21:19 ` Neil Horman 2010-02-25 1:34 ` David Miller 2 siblings, 0 replies; 28+ messages in thread From: Neil Horman @ 2010-02-24 21:19 UTC (permalink / raw) To: Stephens, Allan; +Cc: netdev, jon.maloy, tipc-discussion, davem As noted in my previous note, I'd agreed tht the preiviously posted patches for the various tipc patches we encountered were sufficient for the oops at hand. This patch closes the race condition thats left open after the application of those patches. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> include/net/tipc/tipc.h | 3 ++- net/tipc/net.c | 8 +++++--- 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/include/net/tipc/tipc.h b/include/net/tipc/tipc.h index 9566608..14076e9 100644 --- a/include/net/tipc/tipc.h +++ b/include/net/tipc/tipc.h @@ -54,7 +54,8 @@ u32 tipc_get_addr(void); #define TIPC_NOT_RUNNING 0 #define TIPC_NODE_MODE 1 -#define TIPC_NET_MODE 2 +#define TIPC_TRANS_MODE 2 +#define TIPC_NET_MODE 3 typedef void (*tipc_mode_event)(void *usr_handle, int mode, u32 addr); diff --git a/net/tipc/net.c b/net/tipc/net.c index 7906608..daeafe7 100644 --- a/net/tipc/net.c +++ b/net/tipc/net.c @@ -278,7 +278,7 @@ int tipc_net_start(u32 addr) tipc_cfg_stop(); tipc_own_addr = addr; - tipc_mode = TIPC_NET_MODE; + tipc_mode = TIPC_TRANS_MODE; tipc_named_reinit(); tipc_port_reinit(); @@ -295,18 +295,20 @@ int tipc_net_start(u32 addr) info("Started in network mode\n"); info("Own node address %s, network identity %u\n", addr_string_fill(addr_string, tipc_own_addr), tipc_net_id); + tipc_mode = TIPC_NET_MODE; return 0; } void tipc_net_stop(void) { - if (tipc_mode != TIPC_NET_MODE) + if (tipc_mode < TIPC_TRANS_MODE) return; + tipc_mode = TIPC_TRANS_MODE; write_lock_bh(&tipc_net_lock); tipc_bearer_stop(); - tipc_mode = TIPC_NODE_MODE; tipc_bclink_stop(); net_stop(); + tipc_mode = TIPC_NODE_MODE; write_unlock_bh(&tipc_net_lock); info("Left network mode \n"); } ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH]: tipc: Fix oops on send prior to entering networked mode 2010-02-24 19:05 ` Stephens, Allan 2010-02-24 21:15 ` Neil Horman 2010-02-24 21:19 ` Neil Horman @ 2010-02-25 1:34 ` David Miller 2010-02-25 1:42 ` Neil Horman 2 siblings, 1 reply; 28+ messages in thread From: David Miller @ 2010-02-25 1:34 UTC (permalink / raw) To: allan.stephens; +Cc: nhorman, netdev, jon.maloy, tipc-discussion From: "Stephens, Allan" <allan.stephens@windriver.com> Date: Wed, 24 Feb 2010 11:05:12 -0800 > Have you tried upgrading your system to use TIPC 1.7.6 (available at > http://tipc.sourceforge.net/tipc_linux.html)? This is a significant > revised and enhanced version of TIPC that hasn't yet made its way into > mainsteam Linux, and seems to be the version-of-choice for most TIPC > users. It also appears to avoid a number of the issues that currently > exist in TIPC 1.6, including the one caused by the random configuration > command you mentioned in your email below. > > I didn't have a problem with you working on a small patch for TIPC 1.6 > to get around a limited problem, but I'd hate to see you waste time on > fixing issues that have already been addressed in TIPC 1.7. Why is the TIPC kernel protocol code being developed out of tree instead of directly upstream? Regardless of that, if the code is in the kernel tree and it can be OOPS'd, we must fix it. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH]: tipc: Fix oops on send prior to entering networked mode 2010-02-25 1:34 ` David Miller @ 2010-02-25 1:42 ` Neil Horman 0 siblings, 0 replies; 28+ messages in thread From: Neil Horman @ 2010-02-25 1:42 UTC (permalink / raw) To: David Miller; +Cc: allan.stephens, netdev, jon.maloy, tipc-discussion On Wed, Feb 24, 2010 at 05:34:36PM -0800, David Miller wrote: > From: "Stephens, Allan" <allan.stephens@windriver.com> > Date: Wed, 24 Feb 2010 11:05:12 -0800 > > > Have you tried upgrading your system to use TIPC 1.7.6 (available at > > http://tipc.sourceforge.net/tipc_linux.html)? This is a significant > > revised and enhanced version of TIPC that hasn't yet made its way into > > mainsteam Linux, and seems to be the version-of-choice for most TIPC > > users. It also appears to avoid a number of the issues that currently > > exist in TIPC 1.6, including the one caused by the random configuration > > command you mentioned in your email below. > > > > I didn't have a problem with you working on a small patch for TIPC 1.6 > > to get around a limited problem, but I'd hate to see you waste time on > > fixing issues that have already been addressed in TIPC 1.7. > > Why is the TIPC kernel protocol code being developed out of tree > instead of directly upstream? > > Regardless of that, if the code is in the kernel tree and it can be > OOPS'd, we must fix it. > Agreed, with a combination of: 1) The patch which Allan posted to check tipc_net.zone and tipc_local_nodes for being NULL in tipc_select_node 2) The patch that I asked Allan to extract from the sourceforge package which adds checks for TIPC_NET_MODE in the appropriate places 3) The patch I posted to close the race in the setting of tipc_mode we should fix all the oopses I've found so far. Ideally it would seem what would be most appropriate would be to wholesale import tipc 1.7.6 from sourceforge to the kernel to get all the fixes that have gone in there. That seems like its been waiting in the wings for over a year now. Neil ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH]: tipc: Fix oops on send prior to entering networked mode (v2) 2010-02-19 19:40 [PATCH]: tipc: Fix oops on send prior to entering networked mode Neil Horman 2010-02-22 22:44 ` Stephens, Allan @ 2010-03-02 18:33 ` Neil Horman 2010-03-03 16:51 ` Stephens, Allan 2010-03-08 20:19 ` [PATCH]: tipc: Fix oops on send prior to entering networked mode (v2) David Miller 1 sibling, 2 replies; 28+ messages in thread From: Neil Horman @ 2010-03-02 18:33 UTC (permalink / raw) To: netdev; +Cc: jon.maloy, allan.stephens, tipc-discussion, davem Ok, after some debate, heres version 2 of this patch. Its a complete rewrite. I started with the patches we'd proposed previously, then realized theres a much easier and elegant way to handle this, which I've implemented below. Fix TIPC to disallow sending to remote addresses prior to entering NET_MODE user programs can oops the kernel by sending datagrams via AF_TIPC prior to entering networked mode. The following backtrace has been observed: ID: 13459 TASK: ffff810014640040 CPU: 0 COMMAND: "tipc-client" [exception RIP: tipc_node_select_next_hop+90] RIP: ffffffff8869d3c3 RSP: ffff81002d9a5ab8 RFLAGS: 00010202 RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000001001001 RBP: 0000000001001001 R8: 0074736575716552 R9: 0000000000000000 R10: ffff81003fbd0680 R11: 00000000000000c8 R12: 0000000000000008 R13: 0000000000000001 R14: 0000000000000001 R15: ffff810015c6ca00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #4 [ffff81002d9a5ab0] tipc_node_select_next_hop at ffffffff8869d3b1 #5 [ffff81002d9a5ae0] tipc_link_send_sections_fast at ffffffff88698558 #6 [ffff81002d9a5be0] tipc_forward2port at ffffffff8869eb1d #7 [ffff81002d9a5c10] tipc_send2port at ffffffff8869eb79 #8 [ffff81002d9a5c30] send_msg at ffffffff886a1d0b #9 [ffff81002d9a5cb0] sock_sendmsg at ffffffff80055261 RIP: 0000003cbd8d49a3 RSP: 00007fffc84e0be8 RFLAGS: 00010206 RAX: 000000000000002c RBX: ffffffff8005d116 RCX: 0000000000000000 RDX: 0000000000000008 RSI: 00007fffc84e0c00 RDI: 0000000000000003 RBP: 0000000000000000 R8: 00007fffc84e0c10 R9: 0000000000000010 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fffc84e0d10 R14: 0000000000000000 R15: 00007fffc84e0c30 ORIG_RAX: 000000000000002c CS: 0033 SS: 002b What happens is that, when the tipc module in inserted it enters a standalone node mode in which communication to its own address is allowed <0.0.0> but not to other addresses, since the appropriate data structures have not been allocated yet (specifically the tipc_net pointer). There is nothing stopping a client from trying to send such a message however, and if that happens, we attempt to dereference tipc_net.zones while the pointer is still NULL, and explode. The fix is pretty straightforward. Since these oopses all arise from the dereference of global pointers prior to their assignment to allocated values, and since these allocations are small (about 2k total), lets convert these pointers to static arrays of the appropriate size. All the accesses to these bits consider 0/NULL to be a non match when searching, so all the lookups still work properly, and there is no longer a chance of a bad dererence anywhere. As a bonus, this lets us eliminate the setup/teardown routines for those pointers, and elimnates the need to preform any locking around them to prevent access while their being allocated/freed. I've updated the tipc_net structure to behave this way to fix the exact reported problem, and also fixed up the tipc_bearers and media_list arrays to fix an obvious simmilar problem that arises from issuing tipc-config commands to manipulate bearers/links prior to entering networked mode I've tested this for a few hours by running the sanity tests and stress test with the tipcutils suite, and nothing has fallen over. There have been a few lockdep warnings, but those were there before, and can be addressed later, as they didn't actually result in any deadlock. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Allan Stephens <allan.stephens@windriver.com> CC: David S. Miller <davem@davemloft.net> CC: tipc-discussion@lists.sourceforge.net bearer.c | 28 ++-------------------------- bearer.h | 2 +- net.c | 25 ++++--------------------- 3 files changed, 7 insertions(+), 48 deletions(-) diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c index 327011f..d1b9226 100644 --- a/net/tipc/bearer.c +++ b/net/tipc/bearer.c @@ -45,10 +45,10 @@ #define MAX_ADDR_STR 32 -static struct media *media_list = NULL; +static struct media media_list[MAX_MEDIA]; static u32 media_count = 0; -struct bearer *tipc_bearers = NULL; +struct bearer tipc_bearers[MAX_BEARERS]; /** * media_name_valid - validate media name @@ -660,26 +660,6 @@ int tipc_disable_bearer(const char *name) -int tipc_bearer_init(void) -{ - int res; - - write_lock_bh(&tipc_net_lock); - tipc_bearers = kcalloc(MAX_BEARERS, sizeof(struct bearer), GFP_ATOMIC); - media_list = kcalloc(MAX_MEDIA, sizeof(struct media), GFP_ATOMIC); - if (tipc_bearers && media_list) { - res = 0; - } else { - kfree(tipc_bearers); - kfree(media_list); - tipc_bearers = NULL; - media_list = NULL; - res = -ENOMEM; - } - write_unlock_bh(&tipc_net_lock); - return res; -} - void tipc_bearer_stop(void) { u32 i; @@ -695,10 +675,6 @@ void tipc_bearer_stop(void) if (tipc_bearers[i].active) bearer_disable(tipc_bearers[i].publ.name); } - kfree(tipc_bearers); - kfree(media_list); - tipc_bearers = NULL; - media_list = NULL; media_count = 0; } diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h index ca57348..000228e 100644 --- a/net/tipc/bearer.h +++ b/net/tipc/bearer.h @@ -114,7 +114,7 @@ struct bearer_name { struct link; -extern struct bearer *tipc_bearers; +extern struct bearer tipc_bearers[]; void tipc_media_addr_printf(struct print_buf *pb, struct tipc_media_addr *a); struct sk_buff *tipc_media_get_names(void); diff --git a/net/tipc/net.c b/net/tipc/net.c index 7906608..f25b1cd 100644 --- a/net/tipc/net.c +++ b/net/tipc/net.c @@ -116,7 +116,8 @@ */ DEFINE_RWLOCK(tipc_net_lock); -struct network tipc_net = { NULL }; +struct _zone *tipc_zones[256] = { NULL, }; +struct network tipc_net = { tipc_zones }; struct tipc_node *tipc_net_select_remote_node(u32 addr, u32 ref) { @@ -158,28 +159,12 @@ void tipc_net_send_external_routes(u32 dest) } } -static int net_init(void) -{ - memset(&tipc_net, 0, sizeof(tipc_net)); - tipc_net.zones = kcalloc(tipc_max_zones + 1, sizeof(struct _zone *), GFP_ATOMIC); - if (!tipc_net.zones) { - return -ENOMEM; - } - return 0; -} - static void net_stop(void) { u32 z_num; - if (!tipc_net.zones) - return; - - for (z_num = 1; z_num <= tipc_max_zones; z_num++) { + for (z_num = 1; z_num <= tipc_max_zones; z_num++) tipc_zone_delete(tipc_net.zones[z_num]); - } - kfree(tipc_net.zones); - tipc_net.zones = NULL; } static void net_route_named_msg(struct sk_buff *buf) @@ -282,9 +267,7 @@ int tipc_net_start(u32 addr) tipc_named_reinit(); tipc_port_reinit(); - if ((res = tipc_bearer_init()) || - (res = net_init()) || - (res = tipc_cltr_init()) || + if ((res = tipc_cltr_init()) || (res = tipc_bclink_init())) { return res; } ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH]: tipc: Fix oops on send prior to entering networked mode (v2) 2010-03-02 18:33 ` [PATCH]: tipc: Fix oops on send prior to entering networked mode (v2) Neil Horman @ 2010-03-03 16:51 ` Stephens, Allan 2010-03-03 18:31 ` [PATCH]: tipc: Fix oops on send prior to entering networked mode (v3) Neil Horman 2010-03-08 20:19 ` [PATCH]: tipc: Fix oops on send prior to entering networked mode (v2) David Miller 1 sibling, 1 reply; 28+ messages in thread From: Stephens, Allan @ 2010-03-03 16:51 UTC (permalink / raw) To: Neil Horman, netdev; +Cc: jon.maloy, tipc-discussion, davem Hi Neil: Your patch looks pretty good to me, and is definitely a much cleaner solution than what was discussed previously. The only improvements I can suggest are the following: - It looks like the check to see if tipc_bearers is NULL at the start of tipc_bearer_stop() is unnecessary, since it's now a static array. - The check to see if media_list is NULL at the start of tipc_register_media() needs to be replaced with a check to ensure tipc_mode is TIPC_NET_MODE. (Unlike the case with the tipc_bearers check, we still need some sort of test in place to cause the routine to fail if someone uses TIPC's native API to try registering an add-on media type before TIPC is ready to handle it.) Other than these minor points I can't see anything else that would cause problems. Regards, Al > -----Original Message----- > From: Neil Horman [mailto:nhorman@tuxdriver.com] > Sent: Tuesday, March 02, 2010 1:33 PM > To: netdev@vger.kernel.org > Cc: jon.maloy@ericsson.com; Stephens, Allan; > tipc-discussion@lists.sourceforge.net; davem@davemloft.net > Subject: Re: [PATCH]: tipc: Fix oops on send prior to > entering networked mode (v2) > > Ok, after some debate, heres version 2 of this patch. Its a > complete rewrite. > I started with the patches we'd proposed previously, then > realized theres a much easier and elegant way to handle this, > which I've implemented below. > > Fix TIPC to disallow sending to remote addresses prior to > entering NET_MODE > > user programs can oops the kernel by sending datagrams via > AF_TIPC prior to entering networked mode. The following > backtrace has been observed: > > ID: 13459 TASK: ffff810014640040 CPU: 0 COMMAND: "tipc-client" > [exception RIP: tipc_node_select_next_hop+90] > RIP: ffffffff8869d3c3 RSP: ffff81002d9a5ab8 RFLAGS: 00010202 > RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000001 > RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000001001001 > RBP: 0000000001001001 R8: 0074736575716552 R9: 0000000000000000 > R10: ffff81003fbd0680 R11: 00000000000000c8 R12: 0000000000000008 > R13: 0000000000000001 R14: 0000000000000001 R15: ffff810015c6ca00 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > #4 [ffff81002d9a5ab0] tipc_node_select_next_hop at ffffffff8869d3b1 > #5 [ffff81002d9a5ae0] tipc_link_send_sections_fast at ffffffff88698558 > #6 [ffff81002d9a5be0] tipc_forward2port at ffffffff8869eb1d > #7 [ffff81002d9a5c10] tipc_send2port at ffffffff8869eb79 > #8 [ffff81002d9a5c30] send_msg at ffffffff886a1d0b > #9 [ffff81002d9a5cb0] sock_sendmsg at ffffffff80055261 > RIP: 0000003cbd8d49a3 RSP: 00007fffc84e0be8 RFLAGS: 00010206 > RAX: 000000000000002c RBX: ffffffff8005d116 RCX: 0000000000000000 > RDX: 0000000000000008 RSI: 00007fffc84e0c00 RDI: 0000000000000003 > RBP: 0000000000000000 R8: 00007fffc84e0c10 R9: 0000000000000010 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > R13: 00007fffc84e0d10 R14: 0000000000000000 R15: 00007fffc84e0c30 > ORIG_RAX: 000000000000002c CS: 0033 SS: 002b > > What happens is that, when the tipc module in inserted it > enters a standalone node mode in which communication to its > own address is allowed <0.0.0> but not to other addresses, > since the appropriate data structures have not been allocated > yet (specifically the tipc_net pointer). There is nothing > stopping a client from trying to send such a message however, > and if that happens, we attempt to dereference tipc_net.zones > while the pointer is still NULL, and explode. The fix is > pretty straightforward. Since these oopses all arise from > the dereference of global pointers prior to their assignment > to allocated values, and since these allocations are small > (about 2k total), lets convert these pointers to static > arrays of the appropriate size. All the accesses to these > bits consider 0/NULL to be a non match when searching, so all > the lookups still work properly, and there is no longer a > chance of a bad dererence anywhere. As a bonus, this lets us > eliminate the setup/teardown routines for those pointers, and > elimnates the need to preform any locking around them to > prevent access while their being allocated/freed. > > I've updated the tipc_net structure to behave this way to fix > the exact reported problem, and also fixed up the > tipc_bearers and media_list arrays to fix an obvious simmilar > problem that arises from issuing tipc-config commands to > manipulate bearers/links prior to entering networked mode > > I've tested this for a few hours by running the sanity tests > and stress test with the tipcutils suite, and nothing has > fallen over. There have been a few lockdep warnings, but > those were there before, and can be addressed later, as they > didn't actually result in any deadlock. > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com> > CC: Allan Stephens <allan.stephens@windriver.com> > CC: David S. Miller <davem@davemloft.net> > CC: tipc-discussion@lists.sourceforge.net > > > bearer.c | 28 ++-------------------------- > bearer.h | 2 +- > net.c | 25 ++++--------------------- > 3 files changed, 7 insertions(+), 48 deletions(-) > > diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c index > 327011f..d1b9226 100644 > --- a/net/tipc/bearer.c > +++ b/net/tipc/bearer.c > @@ -45,10 +45,10 @@ > > #define MAX_ADDR_STR 32 > > -static struct media *media_list = NULL; > +static struct media media_list[MAX_MEDIA]; > static u32 media_count = 0; > > -struct bearer *tipc_bearers = NULL; > +struct bearer tipc_bearers[MAX_BEARERS]; > > /** > * media_name_valid - validate media name @@ -660,26 +660,6 > @@ int tipc_disable_bearer(const char *name) > > > > -int tipc_bearer_init(void) > -{ > - int res; > - > - write_lock_bh(&tipc_net_lock); > - tipc_bearers = kcalloc(MAX_BEARERS, sizeof(struct > bearer), GFP_ATOMIC); > - media_list = kcalloc(MAX_MEDIA, sizeof(struct media), > GFP_ATOMIC); > - if (tipc_bearers && media_list) { > - res = 0; > - } else { > - kfree(tipc_bearers); > - kfree(media_list); > - tipc_bearers = NULL; > - media_list = NULL; > - res = -ENOMEM; > - } > - write_unlock_bh(&tipc_net_lock); > - return res; > -} > - > void tipc_bearer_stop(void) > { > u32 i; > @@ -695,10 +675,6 @@ void tipc_bearer_stop(void) > if (tipc_bearers[i].active) > bearer_disable(tipc_bearers[i].publ.name); > } > - kfree(tipc_bearers); > - kfree(media_list); > - tipc_bearers = NULL; > - media_list = NULL; > media_count = 0; > } > > diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h index > ca57348..000228e 100644 > --- a/net/tipc/bearer.h > +++ b/net/tipc/bearer.h > @@ -114,7 +114,7 @@ struct bearer_name { > > struct link; > > -extern struct bearer *tipc_bearers; > +extern struct bearer tipc_bearers[]; > > void tipc_media_addr_printf(struct print_buf *pb, struct > tipc_media_addr *a); struct sk_buff > *tipc_media_get_names(void); diff --git a/net/tipc/net.c > b/net/tipc/net.c index 7906608..f25b1cd 100644 > --- a/net/tipc/net.c > +++ b/net/tipc/net.c > @@ -116,7 +116,8 @@ > */ > > DEFINE_RWLOCK(tipc_net_lock); > -struct network tipc_net = { NULL }; > +struct _zone *tipc_zones[256] = { NULL, }; struct network > tipc_net = { > +tipc_zones }; > > struct tipc_node *tipc_net_select_remote_node(u32 addr, u32 > ref) { @@ -158,28 +159,12 @@ void > tipc_net_send_external_routes(u32 dest) > } > } > > -static int net_init(void) > -{ > - memset(&tipc_net, 0, sizeof(tipc_net)); > - tipc_net.zones = kcalloc(tipc_max_zones + 1, > sizeof(struct _zone *), GFP_ATOMIC); > - if (!tipc_net.zones) { > - return -ENOMEM; > - } > - return 0; > -} > - > static void net_stop(void) > { > u32 z_num; > > - if (!tipc_net.zones) > - return; > - > - for (z_num = 1; z_num <= tipc_max_zones; z_num++) { > + for (z_num = 1; z_num <= tipc_max_zones; z_num++) > tipc_zone_delete(tipc_net.zones[z_num]); > - } > - kfree(tipc_net.zones); > - tipc_net.zones = NULL; > } > > static void net_route_named_msg(struct sk_buff *buf) @@ > -282,9 +267,7 @@ int tipc_net_start(u32 addr) > tipc_named_reinit(); > tipc_port_reinit(); > > - if ((res = tipc_bearer_init()) || > - (res = net_init()) || > - (res = tipc_cltr_init()) || > + if ((res = tipc_cltr_init()) || > (res = tipc_bclink_init())) { > return res; > } > ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH]: tipc: Fix oops on send prior to entering networked mode (v3) 2010-03-03 16:51 ` Stephens, Allan @ 2010-03-03 18:31 ` Neil Horman 2010-03-04 8:40 ` David Miller 0 siblings, 1 reply; 28+ messages in thread From: Neil Horman @ 2010-03-03 18:31 UTC (permalink / raw) To: Stephens, Allan; +Cc: netdev, jon.maloy, tipc-discussion, davem Ok, version 3 of the path, taking comments into account. Notes: 1) Removed now-meaningless check from tipc_bearer_stop 2) Added tipc_mode check at tipc_register_media. I techincally think this isn't needed, since we're only writing an entry to the media_list here, and don't depend on anything else, but it doesn't hurt. In fact, I'm wondering if this isn't an example of how the whole tipc_mode stuff might become unneeded. If we can make all the relevant data structures statically allocated, TIPC_NODE_MODE might just be the trivial case of TIPC_NET_MODE in which only the local node is reachable. Thats another debate though. Summary: Fix TIPC to disallow sending to remote addresses prior to entering NET_MODE user programs can oops the kernel by sending datagrams via AF_TIPC prior to entering networked mode. The following backtrace has been observed: ID: 13459 TASK: ffff810014640040 CPU: 0 COMMAND: "tipc-client" [exception RIP: tipc_node_select_next_hop+90] RIP: ffffffff8869d3c3 RSP: ffff81002d9a5ab8 RFLAGS: 00010202 RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000001001001 RBP: 0000000001001001 R8: 0074736575716552 R9: 0000000000000000 R10: ffff81003fbd0680 R11: 00000000000000c8 R12: 0000000000000008 R13: 0000000000000001 R14: 0000000000000001 R15: ffff810015c6ca00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 RIP: 0000003cbd8d49a3 RSP: 00007fffc84e0be8 RFLAGS: 00010206 RAX: 000000000000002c RBX: ffffffff8005d116 RCX: 0000000000000000 RDX: 0000000000000008 RSI: 00007fffc84e0c00 RDI: 0000000000000003 RBP: 0000000000000000 R8: 00007fffc84e0c10 R9: 0000000000000010 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fffc84e0d10 R14: 0000000000000000 R15: 00007fffc84e0c30 ORIG_RAX: 000000000000002c CS: 0033 SS: 002b What happens is that, when the tipc module in inserted it enters a standalone node mode in which communication to its own address is allowed <0.0.0> but not to other addresses, since the appropriate data structures have not been allocated yet (specifically the tipc_net pointer). There is nothing stopping a client from trying to send such a message however, and if that happens, we attempt to dereference tipc_net.zones while the pointer is still NULL, and explode. The fix is pretty straightforward. Since these oopses all arise from the dereference of global pointers prior to their assignment to allocated values, and since these allocations are small (about 2k total), lets convert these pointers to static arrays of the appropriate size. All the accesses to these bits consider 0/NULL to be a non match when searching, so all the lookups still work properly, and there is no longer a chance of a bad dererence anywhere. As a bonus, this lets us eliminate the setup/teardown routines for those pointers, and elimnates the need to preform any locking around them to prevent access while their being allocated/freed. I've updated the tipc_net structure to behave this way to fix the exact reported problem, and also fixed up the tipc_bearers and media_list arrays to fix an obvious simmilar problem that arises from issuing tipc-config commands to manipulate bearers/links prior to entering networked mode I've tested this for a few hours by running the sanity tests and stress test with the tipcutils suite, and nothing has fallen over. There have been a few lockdep warnings, but those were there before, and can be addressed later, as they didn't actually result in any deadlock. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Allan Stephens <allan.stephens@windriver.com> CC: David S. Miller <davem@davemloft.net> CC: tipc-discussion@lists.sourceforge.net bearer.c | 37 ++++++------------------------------- bearer.h | 2 +- net.c | 25 ++++--------------------- 3 files changed, 11 insertions(+), 53 deletions(-) diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c index 327011f..7809137 100644 --- a/net/tipc/bearer.c +++ b/net/tipc/bearer.c @@ -45,10 +45,10 @@ #define MAX_ADDR_STR 32 -static struct media *media_list = NULL; +static struct media media_list[MAX_MEDIA]; static u32 media_count = 0; -struct bearer *tipc_bearers = NULL; +struct bearer tipc_bearers[MAX_BEARERS]; /** * media_name_valid - validate media name @@ -108,9 +108,11 @@ int tipc_register_media(u32 media_type, int res = -EINVAL; write_lock_bh(&tipc_net_lock); - if (!media_list) - goto exit; + if (tipc_mode != TIPC_NET_MODE) { + warn("Media <%s> rejected, not in networked mode yet\n", name); + goto exit; + } if (!media_name_valid(name)) { warn("Media <%s> rejected, illegal name\n", name); goto exit; @@ -660,33 +662,10 @@ int tipc_disable_bearer(const char *name) -int tipc_bearer_init(void) -{ - int res; - - write_lock_bh(&tipc_net_lock); - tipc_bearers = kcalloc(MAX_BEARERS, sizeof(struct bearer), GFP_ATOMIC); - media_list = kcalloc(MAX_MEDIA, sizeof(struct media), GFP_ATOMIC); - if (tipc_bearers && media_list) { - res = 0; - } else { - kfree(tipc_bearers); - kfree(media_list); - tipc_bearers = NULL; - media_list = NULL; - res = -ENOMEM; - } - write_unlock_bh(&tipc_net_lock); - return res; -} - void tipc_bearer_stop(void) { u32 i; - if (!tipc_bearers) - return; - for (i = 0; i < MAX_BEARERS; i++) { if (tipc_bearers[i].active) tipc_bearers[i].publ.blocked = 1; @@ -695,10 +674,6 @@ void tipc_bearer_stop(void) if (tipc_bearers[i].active) bearer_disable(tipc_bearers[i].publ.name); } - kfree(tipc_bearers); - kfree(media_list); - tipc_bearers = NULL; - media_list = NULL; media_count = 0; } diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h index ca57348..000228e 100644 --- a/net/tipc/bearer.h +++ b/net/tipc/bearer.h @@ -114,7 +114,7 @@ struct bearer_name { struct link; -extern struct bearer *tipc_bearers; +extern struct bearer tipc_bearers[]; void tipc_media_addr_printf(struct print_buf *pb, struct tipc_media_addr *a); struct sk_buff *tipc_media_get_names(void); diff --git a/net/tipc/net.c b/net/tipc/net.c index 7906608..f25b1cd 100644 --- a/net/tipc/net.c +++ b/net/tipc/net.c @@ -116,7 +116,8 @@ */ DEFINE_RWLOCK(tipc_net_lock); -struct network tipc_net = { NULL }; +struct _zone *tipc_zones[256] = { NULL, }; +struct network tipc_net = { tipc_zones }; struct tipc_node *tipc_net_select_remote_node(u32 addr, u32 ref) { @@ -158,28 +159,12 @@ void tipc_net_send_external_routes(u32 dest) } } -static int net_init(void) -{ - memset(&tipc_net, 0, sizeof(tipc_net)); - tipc_net.zones = kcalloc(tipc_max_zones + 1, sizeof(struct _zone *), GFP_ATOMIC); - if (!tipc_net.zones) { - return -ENOMEM; - } - return 0; -} - static void net_stop(void) { u32 z_num; - if (!tipc_net.zones) - return; - - for (z_num = 1; z_num <= tipc_max_zones; z_num++) { + for (z_num = 1; z_num <= tipc_max_zones; z_num++) tipc_zone_delete(tipc_net.zones[z_num]); - } - kfree(tipc_net.zones); - tipc_net.zones = NULL; } static void net_route_named_msg(struct sk_buff *buf) @@ -282,9 +267,7 @@ int tipc_net_start(u32 addr) tipc_named_reinit(); tipc_port_reinit(); - if ((res = tipc_bearer_init()) || - (res = net_init()) || - (res = tipc_cltr_init()) || + if ((res = tipc_cltr_init()) || (res = tipc_bclink_init())) { return res; } ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH]: tipc: Fix oops on send prior to entering networked mode (v3) 2010-03-03 18:31 ` [PATCH]: tipc: Fix oops on send prior to entering networked mode (v3) Neil Horman @ 2010-03-04 8:40 ` David Miller 0 siblings, 0 replies; 28+ messages in thread From: David Miller @ 2010-03-04 8:40 UTC (permalink / raw) To: nhorman; +Cc: allan.stephens, netdev, jon.maloy, tipc-discussion From: Neil Horman <nhorman@tuxdriver.com> Date: Wed, 3 Mar 2010 13:31:23 -0500 > Ok, version 3 of the path, taking comments into account. Applied. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH]: tipc: Fix oops on send prior to entering networked mode (v2) 2010-03-02 18:33 ` [PATCH]: tipc: Fix oops on send prior to entering networked mode (v2) Neil Horman 2010-03-03 16:51 ` Stephens, Allan @ 2010-03-08 20:19 ` David Miller 2010-03-08 20:49 ` Neil Horman 2010-03-08 21:13 ` Neil Horman 1 sibling, 2 replies; 28+ messages in thread From: David Miller @ 2010-03-08 20:19 UTC (permalink / raw) To: nhorman; +Cc: netdev, jon.maloy, allan.stephens, tipc-discussion Doesn't apply to net-2.6, Neil please respin. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH]: tipc: Fix oops on send prior to entering networked mode (v2) 2010-03-08 20:19 ` [PATCH]: tipc: Fix oops on send prior to entering networked mode (v2) David Miller @ 2010-03-08 20:49 ` Neil Horman 2010-03-08 21:13 ` Neil Horman 1 sibling, 0 replies; 28+ messages in thread From: Neil Horman @ 2010-03-08 20:49 UTC (permalink / raw) To: David Miller; +Cc: netdev, jon.maloy, allan.stephens, tipc-discussion On Mon, Mar 08, 2010 at 12:19:42PM -0800, David Miller wrote: > > Doesn't apply to net-2.6, Neil please respin. > Will do. I'll repost in a few hours Neil ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH]: tipc: Fix oops on send prior to entering networked mode (v2) 2010-03-08 20:19 ` [PATCH]: tipc: Fix oops on send prior to entering networked mode (v2) David Miller 2010-03-08 20:49 ` Neil Horman @ 2010-03-08 21:13 ` Neil Horman 2010-03-08 21:26 ` David Miller 1 sibling, 1 reply; 28+ messages in thread From: Neil Horman @ 2010-03-08 21:13 UTC (permalink / raw) To: David Miller; +Cc: netdev, jon.maloy, allan.stephens, tipc-discussion On Mon, Mar 08, 2010 at 12:19:42PM -0800, David Miller wrote: > > Doesn't apply to net-2.6, Neil please respin. > Yes, sorry David, didn't mean to reply so quickly before. I just looked at your net-2.6 tree, and d0021b252eaf65ca07ed14f0d66425dd9ccab9a6 is in there just fine. Not sure if I'm missing something or not, but as far as I can see you've already apply the patch properly. Regards Neil ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH]: tipc: Fix oops on send prior to entering networked mode (v2) 2010-03-08 21:13 ` Neil Horman @ 2010-03-08 21:26 ` David Miller 0 siblings, 0 replies; 28+ messages in thread From: David Miller @ 2010-03-08 21:26 UTC (permalink / raw) To: nhorman; +Cc: netdev, jon.maloy, allan.stephens, tipc-discussion From: Neil Horman <nhorman@tuxdriver.com> Date: Mon, 8 Mar 2010 16:13:08 -0500 > On Mon, Mar 08, 2010 at 12:19:42PM -0800, David Miller wrote: >> >> Doesn't apply to net-2.6, Neil please respin. >> > Yes, sorry David, didn't mean to reply so quickly before. I just looked at your > net-2.6 tree, and d0021b252eaf65ca07ed14f0d66425dd9ccab9a6 is in there just > fine. Not sure if I'm missing something or not, but as far as I can see you've > already apply the patch properly. Perfect. ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2010-03-08 21:26 UTC | newest] Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-02-19 19:40 [PATCH]: tipc: Fix oops on send prior to entering networked mode Neil Horman 2010-02-22 22:44 ` Stephens, Allan 2010-02-23 1:11 ` Neil Horman 2010-02-23 15:02 ` Stephens, Allan 2010-02-23 16:09 ` Neil Horman 2010-02-23 16:21 ` Stephens, Allan 2010-02-24 18:53 ` Neil Horman 2010-02-24 19:05 ` Stephens, Allan 2010-02-24 21:15 ` Neil Horman 2010-02-25 1:38 ` David Miller 2010-02-25 14:24 ` Stephens, Allan 2010-02-25 15:06 ` David Miller 2010-02-25 16:24 ` Stephens, Allan 2010-02-25 15:13 ` David Miller 2010-02-25 15:23 ` Neil Horman 2010-02-25 20:33 ` Stephens, Allan 2010-02-25 21:14 ` Neil Horman 2010-02-24 21:19 ` Neil Horman 2010-02-25 1:34 ` David Miller 2010-02-25 1:42 ` Neil Horman 2010-03-02 18:33 ` [PATCH]: tipc: Fix oops on send prior to entering networked mode (v2) Neil Horman 2010-03-03 16:51 ` Stephens, Allan 2010-03-03 18:31 ` [PATCH]: tipc: Fix oops on send prior to entering networked mode (v3) Neil Horman 2010-03-04 8:40 ` David Miller 2010-03-08 20:19 ` [PATCH]: tipc: Fix oops on send prior to entering networked mode (v2) David Miller 2010-03-08 20:49 ` Neil Horman 2010-03-08 21:13 ` Neil Horman 2010-03-08 21:26 ` David Miller
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.