* [RFC PATCH] network: return errors if we know tcp_connect failed @ 2010-11-11 21:03 Eric Paris 2010-11-11 21:14 ` Eric Dumazet ` (2 more replies) 0 siblings, 3 replies; 31+ messages in thread From: Eric Paris @ 2010-11-11 21:03 UTC (permalink / raw) To: netdev, linux-kernel; +Cc: davem, kuznet, pekkas, jmorris, yoshfuji, kaber THIS PATCH IS VERY POSSIBLY WRONG! But if it is I want some feedback. Basically what I found was that if I added an iptables rule like so: iptables -A OUTPUT -p tcp --dport 80 -j DROP And then ran a web browser like links it would just hang on 'establishing connection.' I expected that the application would immediately, or at least very quickly, get notified that the connect failed. This waiting for timeout would be expected if something else dropped the SYN or if we were dropping the SYN/ACK packet coming back, but I figured if we knew we threw away the SYN we knew right away that the connection was denied and we should be able to indicate that to the application. Yes, I realize this is little different than if the SYN was dropped in the first network device, but it is different because we know what happened! We know that connect() call failed and that there isn't anything coming back. What I discovered was that we actually had 2 problems in making it possible. For userspace to quickly realize the connect failed. The first was a problem in the netfilter code which wasn't passing errors back up the stack correctly, due to what I believe to be a mistake in precedence rules. http://marc.info/?l=netfilter-devel&m=128950262021804&w=2 And the second was that tcp_connect() was just ignoring the return value from tcp_transmit_skb(). Maybe this was intentional but I really wish we could find out that connect failed long before the minutes long timeout. Once I fixed both of those issues I find that links gets denied (with EPERM) immediately when it calls connect(). Is this wrong? Is this bad to tell userspace more quickly what happened? Does passing this error code back up the stack here break something else? Why do some functions seem to pay attention to tcp_transmit_skb() return codes and some functions just ignore it? What do others think? NOT-AT-ALL-SIGNED-OFF-BY --- net/ipv4/tcp_output.c | 5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index e961522..67b8535 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2592,6 +2592,7 @@ int tcp_connect(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *buff; + int ret; tcp_connect_init(sk); @@ -2614,7 +2615,7 @@ int tcp_connect(struct sock *sk) sk->sk_wmem_queued += buff->truesize; sk_mem_charge(sk, buff->truesize); tp->packets_out += tcp_skb_pcount(buff); - tcp_transmit_skb(sk, buff, 1, sk->sk_allocation); + ret = tcp_transmit_skb(sk, buff, 1, sk->sk_allocation); /* We change tp->snd_nxt after the tcp_transmit_skb() call * in order to make this packet get counted in tcpOutSegs. @@ -2626,7 +2627,7 @@ int tcp_connect(struct sock *sk) /* Timer for repeating the SYN until an answer. */ inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, inet_csk(sk)->icsk_rto, TCP_RTO_MAX); - return 0; + return ret; } EXPORT_SYMBOL(tcp_connect); ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] network: return errors if we know tcp_connect failed 2010-11-11 21:03 [RFC PATCH] network: return errors if we know tcp_connect failed Eric Paris @ 2010-11-11 21:14 ` Eric Dumazet 2010-11-11 21:58 ` Hua Zhong 2010-11-12 17:46 ` Alexey Kuznetsov 2 siblings, 0 replies; 31+ messages in thread From: Eric Dumazet @ 2010-11-11 21:14 UTC (permalink / raw) To: Eric Paris Cc: netdev, linux-kernel, davem, kuznet, pekkas, jmorris, yoshfuji, kaber Le jeudi 11 novembre 2010 à 16:03 -0500, Eric Paris a écrit : > THIS PATCH IS VERY POSSIBLY WRONG! But if it is I want some feedback. > > Basically what I found was that if I added an iptables rule like so: > > iptables -A OUTPUT -p tcp --dport 80 -j DROP > > And then ran a web browser like links it would just hang on 'establishing > connection.' I expected that the application would immediately, or at least > very quickly, get notified that the connect failed. This waiting for timeout > would be expected if something else dropped the SYN or if we were dropping the > SYN/ACK packet coming back, but I figured if we knew we threw away the SYN we knew > right away that the connection was denied and we should be able to indicate > that to the application. Yes, I realize this is little different than if the > SYN was dropped in the first network device, but it is different because we > know what happened! We know that connect() call failed and that there isn't > anything coming back. > > What I discovered was that we actually had 2 problems in making it possible. > For userspace to quickly realize the connect failed. The first was a problem > in the netfilter code which wasn't passing errors back up the stack correctly, > due to what I believe to be a mistake in precedence rules. > > http://marc.info/?l=netfilter-devel&m=128950262021804&w=2 > > And the second was that tcp_connect() was just ignoring the return value from > tcp_transmit_skb(). Maybe this was intentional but I really wish we could > find out that connect failed long before the minutes long timeout. Once I > fixed both of those issues I find that links gets denied (with EPERM) > immediately when it calls connect(). Is this wrong? Is this bad to tell > userspace more quickly what happened? Does passing this error code back up > the stack here break something else? Why do some functions seem to pay > attention to tcp_transmit_skb() return codes and some functions just ignore > it? What do others think? > I think its an interesting idea, but a temporary memory shortage would abort the connect(). We could imagine some special handling of the first packet of a flow being DROPED for whatever reason (flow control...) So it needs some refinement I think. SYN packets should be allowed to be re-transmitted before saying a TCP connect() cannot succeed. ^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: [RFC PATCH] network: return errors if we know tcp_connect failed 2010-11-11 21:03 [RFC PATCH] network: return errors if we know tcp_connect failed Eric Paris 2010-11-11 21:14 ` Eric Dumazet @ 2010-11-11 21:58 ` Hua Zhong 2010-11-12 7:36 ` Patrick McHardy 2010-11-12 16:08 ` Eric Paris 2010-11-12 17:46 ` Alexey Kuznetsov 2 siblings, 2 replies; 31+ messages in thread From: Hua Zhong @ 2010-11-11 21:58 UTC (permalink / raw) To: 'Eric Paris', netdev, linux-kernel Cc: davem, kuznet, pekkas, jmorris, yoshfuji, kaber > Yes, I realize this is little different than if the > SYN was dropped in the first network device, but it is different > because we know what happened! We know that connect() call failed > and that there isn't anything coming back. I would argue that -j DROP should behave exactly as the packet is dropped in the network, while -j REJECT should signal the failure to the application as soon as possible (which it doesn't seem to do). It does not only make sense, but also is a highly useful testing technique that we use -j DROP in OUTPUT to emulate network losses and see how the application behaves. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] network: return errors if we know tcp_connect failed 2010-11-11 21:58 ` Hua Zhong @ 2010-11-12 7:36 ` Patrick McHardy 2010-11-12 23:14 ` Hua Zhong 2010-11-12 16:08 ` Eric Paris 1 sibling, 1 reply; 31+ messages in thread From: Patrick McHardy @ 2010-11-12 7:36 UTC (permalink / raw) To: Hua Zhong Cc: 'Eric Paris', netdev, linux-kernel, davem, kuznet, pekkas, jmorris, yoshfuji On 11.11.2010 22:58, Hua Zhong wrote: >> Yes, I realize this is little different than if the >> SYN was dropped in the first network device, but it is different >> because we know what happened! We know that connect() call failed >> and that there isn't anything coming back. > > I would argue that -j DROP should behave exactly as the packet is dropped in the network, while -j REJECT should signal the failure to the application as soon as possible (which it doesn't seem to do). It sends an ICMP error or TCP reset. Interpretation is up to TCP. ^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: [RFC PATCH] network: return errors if we know tcp_connect failed 2010-11-12 7:36 ` Patrick McHardy @ 2010-11-12 23:14 ` Hua Zhong 2010-11-15 10:32 ` Patrick McHardy 0 siblings, 1 reply; 31+ messages in thread From: Hua Zhong @ 2010-11-12 23:14 UTC (permalink / raw) To: 'Patrick McHardy' Cc: 'Eric Paris', netdev, linux-kernel, davem, kuznet, pekkas, jmorris, yoshfuji > On 11.11.2010 22:58, Hua Zhong wrote: > >> Yes, I realize this is little different than if the > >> SYN was dropped in the first network device, but it is different > >> because we know what happened! We know that connect() call failed > >> and that there isn't anything coming back. > > > > I would argue that -j DROP should behave exactly as the packet is > dropped in the network, while -j REJECT should signal the failure to > the application as soon as possible (which it doesn't seem to do). > > It sends an ICMP error or TCP reset. Interpretation is up to TCP. Huh? It's the OUTPUT chain we are talking about. There is no ICMP error or TCP reset. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] network: return errors if we know tcp_connect failed 2010-11-12 23:14 ` Hua Zhong @ 2010-11-15 10:32 ` Patrick McHardy 2010-11-15 15:47 ` Eric Paris 0 siblings, 1 reply; 31+ messages in thread From: Patrick McHardy @ 2010-11-15 10:32 UTC (permalink / raw) To: Hua Zhong Cc: 'Eric Paris', netdev, linux-kernel, davem, kuznet, pekkas, jmorris, yoshfuji On 13.11.2010 00:14, Hua Zhong wrote: >> On 11.11.2010 22:58, Hua Zhong wrote: >>>> Yes, I realize this is little different than if the >>>> SYN was dropped in the first network device, but it is different >>>> because we know what happened! We know that connect() call failed >>>> and that there isn't anything coming back. >>> >>> I would argue that -j DROP should behave exactly as the packet is >> dropped in the network, while -j REJECT should signal the failure to >> the application as soon as possible (which it doesn't seem to do). >> >> It sends an ICMP error or TCP reset. Interpretation is up to TCP. > > Huh? It's the OUTPUT chain we are talking about. There is no ICMP error or > TCP reset. Of course there is. ICMP (default): iptables -A OUTPUT -p tcp -j REJECT TCP reset: iptables -A OUTPUT -p tcp -j REJECT --reject-with tcp-reset The second one will cause a hard error for the connection. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] network: return errors if we know tcp_connect failed 2010-11-15 10:32 ` Patrick McHardy @ 2010-11-15 15:47 ` Eric Paris 2010-11-15 15:57 ` Patrick McHardy 2010-11-15 20:00 ` Alexey Kuznetsov 0 siblings, 2 replies; 31+ messages in thread From: Eric Paris @ 2010-11-15 15:47 UTC (permalink / raw) To: Patrick McHardy Cc: Hua Zhong, netdev, linux-kernel, davem, kuznet, pekkas, jmorris, yoshfuji On Mon, 2010-11-15 at 11:32 +0100, Patrick McHardy wrote: > On 13.11.2010 00:14, Hua Zhong wrote: > >> On 11.11.2010 22:58, Hua Zhong wrote: > >>>> Yes, I realize this is little different than if the > >>>> SYN was dropped in the first network device, but it is different > >>>> because we know what happened! We know that connect() call failed > >>>> and that there isn't anything coming back. > >>> > >>> I would argue that -j DROP should behave exactly as the packet is > >> dropped in the network, while -j REJECT should signal the failure to > >> the application as soon as possible (which it doesn't seem to do). > >> > >> It sends an ICMP error or TCP reset. Interpretation is up to TCP. > > > > Huh? It's the OUTPUT chain we are talking about. There is no ICMP error or > > TCP reset. > > Of course there is. > > ICMP (default): > > iptables -A OUTPUT -p tcp -j REJECT > > TCP reset: > > iptables -A OUTPUT -p tcp -j REJECT --reject-with tcp-reset > > The second one will cause a hard error for the connection. Well I'm (I guess?) surprised that the --reject-with icmp doesn't do anything with a local outgoing connection but --reject-with tcp-reset does something like what I'm looking for. I notice the heavy lifting for this is done in net/ipv4/netfilter/ipt_REJECT.c::send_rest() (and something very similar for IPv6) I really don't want to duplicate that code into SELinux (for obvious reasons) and I'm wondering if anyone has objections to me making it available outside of netlink and/or suggestions on how to make that code available outside of netfilter (aka what header to expose it, and does it still make logical sense in ipt_REJECT.c or somewhere else?) -Eric ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] network: return errors if we know tcp_connect failed 2010-11-15 15:47 ` Eric Paris @ 2010-11-15 15:57 ` Patrick McHardy 2010-11-15 16:04 ` Patrick McHardy 2010-11-15 16:36 ` Patrick McHardy 2010-11-15 20:00 ` Alexey Kuznetsov 1 sibling, 2 replies; 31+ messages in thread From: Patrick McHardy @ 2010-11-15 15:57 UTC (permalink / raw) To: Eric Paris Cc: Hua Zhong, netdev, linux-kernel, davem, kuznet, pekkas, jmorris, yoshfuji On 15.11.2010 16:47, Eric Paris wrote: > On Mon, 2010-11-15 at 11:32 +0100, Patrick McHardy wrote: >> On 13.11.2010 00:14, Hua Zhong wrote: >>>> On 11.11.2010 22:58, Hua Zhong wrote: >>>>>> Yes, I realize this is little different than if the >>>>>> SYN was dropped in the first network device, but it is different >>>>>> because we know what happened! We know that connect() call failed >>>>>> and that there isn't anything coming back. >>>>> >>>>> I would argue that -j DROP should behave exactly as the packet is >>>> dropped in the network, while -j REJECT should signal the failure to >>>> the application as soon as possible (which it doesn't seem to do). >>>> >>>> It sends an ICMP error or TCP reset. Interpretation is up to TCP. >>> >>> Huh? It's the OUTPUT chain we are talking about. There is no ICMP error or >>> TCP reset. >> >> Of course there is. >> >> ICMP (default): >> >> iptables -A OUTPUT -p tcp -j REJECT >> >> TCP reset: >> >> iptables -A OUTPUT -p tcp -j REJECT --reject-with tcp-reset >> >> The second one will cause a hard error for the connection. > > Well I'm (I guess?) surprised that the --reject-with icmp doesn't do > anything with a local outgoing connection but --reject-with tcp-reset > does something like what I'm looking for. > > I notice the heavy lifting for this is done in > net/ipv4/netfilter/ipt_REJECT.c::send_rest() > (and something very similar for IPv6) > > I really don't want to duplicate that code into SELinux (for obvious > reasons) and I'm wondering if anyone has objections to me making it > available outside of netlink and/or suggestions on how to make that code > available outside of netfilter (aka what header to expose it, and does > it still make logical sense in ipt_REJECT.c or somewhere else?) I don't think having SELinux sending packets to handle local connections is a very elegant design, its not a firewall after all. What's wrong with reacting only to specific errno codes in tcp_connect()? You could f.i. return -ECONNREFUSED from SELinux, that one is pretty much guaranteed not to occur in the network stack itself and can be returned directly. That would need minor changes to nf_hook_slow so we can encode errno values in the upper 16 bits of the verdict, as we already do with the queue number. The added benefit is that we don't have to return EPERM anymore when f.i. rerouting fails. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] network: return errors if we know tcp_connect failed 2010-11-15 15:57 ` Patrick McHardy @ 2010-11-15 16:04 ` Patrick McHardy 2010-11-15 16:36 ` Patrick McHardy 1 sibling, 0 replies; 31+ messages in thread From: Patrick McHardy @ 2010-11-15 16:04 UTC (permalink / raw) To: Eric Paris Cc: Hua Zhong, netdev, linux-kernel, davem, kuznet, pekkas, jmorris, yoshfuji On 15.11.2010 16:57, Patrick McHardy wrote: > On 15.11.2010 16:47, Eric Paris wrote: >>> iptables -A OUTPUT -p tcp -j REJECT --reject-with tcp-reset >>> >>> The second one will cause a hard error for the connection. >> >> Well I'm (I guess?) surprised that the --reject-with icmp doesn't do >> anything with a local outgoing connection but --reject-with tcp-reset >> does something like what I'm looking for. >> >> I notice the heavy lifting for this is done in >> net/ipv4/netfilter/ipt_REJECT.c::send_rest() >> (and something very similar for IPv6) >> >> I really don't want to duplicate that code into SELinux (for obvious >> reasons) and I'm wondering if anyone has objections to me making it >> available outside of netlink and/or suggestions on how to make that code >> available outside of netfilter (aka what header to expose it, and does >> it still make logical sense in ipt_REJECT.c or somewhere else?) > > I don't think having SELinux sending packets to handle local > connections is a very elegant design, its not a firewall after > all. What's wrong with reacting only to specific errno codes > in tcp_connect()? You could f.i. return -ECONNREFUSED from > SELinux, that one is pretty much guaranteed not to occur in > the network stack itself and can be returned directly. One more note: there is also the problem that the RST might never reach the socket, f.i. because netfilter drops it, or TC actions reroute it etc. With netfilter users are expected to make sure the entire combination of network features does what the expect, but that's probably not what you want for SELinux. > That would need minor changes to nf_hook_slow so we can > encode errno values in the upper 16 bits of the verdict, > as we already do with the queue number. The added benefit > is that we don't have to return EPERM anymore when f.i. > rerouting fails. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] network: return errors if we know tcp_connect failed 2010-11-15 15:57 ` Patrick McHardy 2010-11-15 16:04 ` Patrick McHardy @ 2010-11-15 16:36 ` Patrick McHardy 2010-11-15 16:46 ` David Miller 1 sibling, 1 reply; 31+ messages in thread From: Patrick McHardy @ 2010-11-15 16:36 UTC (permalink / raw) To: Eric Paris Cc: Hua Zhong, netdev, linux-kernel, davem, kuznet, pekkas, jmorris, yoshfuji, Netfilter Development Mailinglist [-- Attachment #1: Type: text/plain, Size: 1599 bytes --] On 15.11.2010 16:57, Patrick McHardy wrote: > On 15.11.2010 16:47, Eric Paris wrote: >> I notice the heavy lifting for this is done in >> net/ipv4/netfilter/ipt_REJECT.c::send_rest() >> (and something very similar for IPv6) >> >> I really don't want to duplicate that code into SELinux (for obvious >> reasons) and I'm wondering if anyone has objections to me making it >> available outside of netlink and/or suggestions on how to make that code >> available outside of netfilter (aka what header to expose it, and does >> it still make logical sense in ipt_REJECT.c or somewhere else?) > > I don't think having SELinux sending packets to handle local > connections is a very elegant design, its not a firewall after > all. What's wrong with reacting only to specific errno codes > in tcp_connect()? You could f.i. return -ECONNREFUSED from > SELinux, that one is pretty much guaranteed not to occur in > the network stack itself and can be returned directly. > > That would need minor changes to nf_hook_slow so we can > encode errno values in the upper 16 bits of the verdict, > as we already do with the queue number. The added benefit > is that we don't have to return EPERM anymore when f.i. > rerouting fails. Patch for demonstration purposes attached. I've modified the MARK target so it returns NF_DROP with an errno code of -ECONNREFUSED: # iptables -A OUTPUT -d 1.2.3.4 -j MARK --set-mark 1 # ping 1.2.3.4 PING 1.2.3.4 (1.2.3.4) 56(84) bytes of data. ping: sendmsg: Connection refused # telnet 1.2.3.4 Trying 1.2.3.4... telnet: Unable to connect to remote host: Connection refused [-- Attachment #2: x --] [-- Type: text/plain, Size: 2252 bytes --] diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h index 89341c3..ef2af8f 100644 --- a/include/linux/netfilter.h +++ b/include/linux/netfilter.h @@ -33,6 +33,8 @@ #define NF_QUEUE_NR(x) ((((x) << NF_VERDICT_BITS) & NF_VERDICT_QMASK) | NF_QUEUE) +#define NF_DROP_ERR(x) (((-x) << NF_VERDICT_BITS) | NF_DROP) + /* only for userspace compatibility */ #ifndef __KERNEL__ /* Generic cache responses from hook functions. diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 05b1ecf..bb8f547 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2592,6 +2592,7 @@ int tcp_connect(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *buff; + int err; tcp_connect_init(sk); @@ -2614,7 +2615,9 @@ int tcp_connect(struct sock *sk) sk->sk_wmem_queued += buff->truesize; sk_mem_charge(sk, buff->truesize); tp->packets_out += tcp_skb_pcount(buff); - tcp_transmit_skb(sk, buff, 1, sk->sk_allocation); + err = tcp_transmit_skb(sk, buff, 1, sk->sk_allocation); + if (err == -ECONNREFUSED) + return err; /* We change tp->snd_nxt after the tcp_transmit_skb() call * in order to make this packet get counted in tcpOutSegs. diff --git a/net/netfilter/core.c b/net/netfilter/core.c index 85dabb8..32fcbe2 100644 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@ -173,9 +173,11 @@ next_hook: outdev, &elem, okfn, hook_thresh); if (verdict == NF_ACCEPT || verdict == NF_STOP) { ret = 1; - } else if (verdict == NF_DROP) { + } else if ((verdict & NF_VERDICT_MASK) == NF_DROP) { kfree_skb(skb); - ret = -EPERM; + ret = -(verdict >> NF_VERDICT_BITS); + if (ret == 0) + ret = -EPERM; } else if ((verdict & NF_VERDICT_MASK) == NF_QUEUE) { if (!nf_queue(skb, elem, pf, hook, indev, outdev, okfn, verdict >> NF_VERDICT_BITS)) diff --git a/net/netfilter/xt_mark.c b/net/netfilter/xt_mark.c index 2334523..185330c 100644 --- a/net/netfilter/xt_mark.c +++ b/net/netfilter/xt_mark.c @@ -30,7 +30,7 @@ mark_tg(struct sk_buff *skb, const struct xt_action_param *par) const struct xt_mark_tginfo2 *info = par->targinfo; skb->mark = (skb->mark & ~info->mask) ^ info->mark; - return XT_CONTINUE; + return NF_DROP_ERR(-ECONNREFUSED); } static bool ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] network: return errors if we know tcp_connect failed 2010-11-15 16:36 ` Patrick McHardy @ 2010-11-15 16:46 ` David Miller 0 siblings, 0 replies; 31+ messages in thread From: David Miller @ 2010-11-15 16:46 UTC (permalink / raw) To: kaber Cc: eparis, hzhong, netdev, linux-kernel, kuznet, pekkas, jmorris, yoshfuji, netfilter-devel From: Patrick McHardy <kaber@trash.net> Date: Mon, 15 Nov 2010 17:36:40 +0100 > Patch for demonstration purposes attached. I've modified the > MARK target so it returns NF_DROP with an errno code of > -ECONNREFUSED: I'm fine with the tcp_output.c changes. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] network: return errors if we know tcp_connect failed 2010-11-15 15:47 ` Eric Paris 2010-11-15 15:57 ` Patrick McHardy @ 2010-11-15 20:00 ` Alexey Kuznetsov 1 sibling, 0 replies; 31+ messages in thread From: Alexey Kuznetsov @ 2010-11-15 20:00 UTC (permalink / raw) To: Eric Paris Cc: Patrick McHardy, Hua Zhong, netdev, linux-kernel, davem, pekkas, jmorris, yoshfuji Hello! On Mon, Nov 15, 2010 at 10:47:46AM -0500, Eric Paris wrote: > Well I'm (I guess?) surprised that the --reject-with icmp doesn't do > anything with a local outgoing connection Yeah. This was "an obvious surpise": for local socket the first icmp error always arrives on locked socket and gets dropped. I even forgot about this (tcp-reset still works ok due to backlog) This makes the idea to respect error more attractive. Alexey ^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: [RFC PATCH] network: return errors if we know tcp_connect failed 2010-11-11 21:58 ` Hua Zhong 2010-11-12 7:36 ` Patrick McHardy @ 2010-11-12 16:08 ` Eric Paris 2010-11-12 16:15 ` Eric Dumazet 1 sibling, 1 reply; 31+ messages in thread From: Eric Paris @ 2010-11-12 16:08 UTC (permalink / raw) To: Hua Zhong Cc: netdev, linux-kernel, davem, kuznet, pekkas, jmorris, yoshfuji, kaber, eric.dumazet, paul.moore On Thu, 2010-11-11 at 13:58 -0800, Hua Zhong wrote: > > Yes, I realize this is little different than if the > > SYN was dropped in the first network device, but it is different > > because we know what happened! We know that connect() call failed > > and that there isn't anything coming back. > > I would argue that -j DROP should behave exactly as the packet is dropped in the network, while -j REJECT should signal the failure to the application as soon as possible (which it doesn't seem to do). > > It does not only make sense, but also is a highly useful testing technique that we use -j DROP in OUTPUT to emulate network losses and see how the application behaves. I guess I can be a bit more descriptive of my specific situation, although I'm not sure it matters. I don't actually plan to drop packets with -j REJECT or -j DROP, that's just a simple example everyone can see on their own machine. I plan to have the packets drop in the selinux netfilter hook. The SELinux hook uses NF_DROP/NF_ACCEPT just like any other netfilter hook. Maybe the answer is that I need to duplicate the -j REJECT type operations in the SELinux hook. -j REJECT doesn't do what I want today, but if that's the right way forward tell me and I'll look down that path. But the path I first started looking down rules in 2 distinct questions: 1) What should netfilter pass back up the stack. From my looking at this I see that nf_hook_slow() will convert NF_DROP into -EPERM and pass that back up the stack. Is this wrong? Should it more intelligently pass errors back up the stack? Maybe it needs an NF_REJECT as well as NF_DROP? NF_DROP returns 0 maybe and NF_REJECT return EPERM? 2) What should the generic TCP code (tcp_connect()) do if the skb failed to send. Should it return error codes back up the stack somehow or should they continue to be ignored? Obviously continuing to just ignore information we have doesn't make me happy (otherwise I wouldn't have started scratching this itch). But the point about ENOBUFS is well taken. Maybe I should make tcp_connect(), or the caller to tcp_connect() more intelligent about specific error codes? I'm looking for a path forward. If SELinux is rejecting the SYN packets on connect() I want to pass that info to userspace rather than just hanging. What's the best way to accomplish that? -Eric ^ permalink raw reply [flat|nested] 31+ messages in thread
* RE: [RFC PATCH] network: return errors if we know tcp_connect failed 2010-11-12 16:08 ` Eric Paris @ 2010-11-12 16:15 ` Eric Dumazet 2010-11-12 16:35 ` David Lamparter 0 siblings, 1 reply; 31+ messages in thread From: Eric Dumazet @ 2010-11-12 16:15 UTC (permalink / raw) To: Eric Paris Cc: Hua Zhong, netdev, linux-kernel, davem, kuznet, pekkas, jmorris, yoshfuji, kaber, paul.moore Le vendredi 12 novembre 2010 à 11:08 -0500, Eric Paris a écrit : > 2) What should the generic TCP code (tcp_connect()) do if the skb failed > to send. Should it return error codes back up the stack somehow or > should they continue to be ignored? Obviously continuing to just ignore > information we have doesn't make me happy (otherwise I wouldn't have > started scratching this itch). But the point about ENOBUFS is well > taken. Maybe I should make tcp_connect(), or the caller to > tcp_connect() more intelligent about specific error codes? > > I'm looking for a path forward. If SELinux is rejecting the SYN packets > on connect() I want to pass that info to userspace rather than just > hanging. What's the best way to accomplish that? > Eric, if you can differentiate a permanent reject, instead of a temporary one (congestion, or rate limiting, or ENOBUF, or ...), then yes, you could make tcp_connect() report to user the permanent error, and ignore the temporary one. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] network: return errors if we know tcp_connect failed 2010-11-12 16:15 ` Eric Dumazet @ 2010-11-12 16:35 ` David Lamparter 2010-11-12 16:53 ` Eric Paris 2010-11-12 16:54 ` Patrick McHardy 0 siblings, 2 replies; 31+ messages in thread From: David Lamparter @ 2010-11-12 16:35 UTC (permalink / raw) To: Eric Dumazet Cc: Eric Paris, Hua Zhong, netdev, linux-kernel, davem, kuznet, pekkas, jmorris, yoshfuji, kaber, paul.moore On Fri, Nov 12, 2010 at 05:15:32PM +0100, Eric Dumazet wrote: > Le vendredi 12 novembre 2010 à 11:08 -0500, Eric Paris a écrit : > > > 2) What should the generic TCP code (tcp_connect()) do if the skb failed > > to send. Should it return error codes back up the stack somehow or > > should they continue to be ignored? Obviously continuing to just ignore > > information we have doesn't make me happy (otherwise I wouldn't have > > started scratching this itch). But the point about ENOBUFS is well > > taken. Maybe I should make tcp_connect(), or the caller to > > tcp_connect() more intelligent about specific error codes? > > > > I'm looking for a path forward. If SELinux is rejecting the SYN packets > > on connect() I want to pass that info to userspace rather than just > > hanging. What's the best way to accomplish that? > > > > Eric, if you can differentiate a permanent reject, instead of a > temporary one (congestion, or rate limiting, or ENOBUF, or ...), then > yes, you could make tcp_connect() report to user the permanent error, > and ignore the temporary one. If the netfilter targets DROP/REJECT match the NF_DROP/NF_REJECT counterparts, which i guess they do but i didn't read the source ;), then SELinux should use NF_REJECT in my opinion. NF_DROP does exactly what the name says, it drops the packet aka basically puts it in /dev/null. As with writing to /dev/null, you don't get an error for that. Even more, if in the meantime the DROP rule does not match anymore, the 2nd or 3rd SYN from the connect() can come through and establish a connection (think of "-m statistic" & co.) This is very different from REJECT. If REJECT doesn't immediately get reported to the application, that *is* a bug, but last time i checked i got EPERM immediately. I would fix SELinux to use the same mechanism. -David ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] network: return errors if we know tcp_connect failed 2010-11-12 16:35 ` David Lamparter @ 2010-11-12 16:53 ` Eric Paris 2010-11-12 16:54 ` Patrick McHardy 1 sibling, 0 replies; 31+ messages in thread From: Eric Paris @ 2010-11-12 16:53 UTC (permalink / raw) To: David Lamparter Cc: Eric Dumazet, Hua Zhong, netdev, linux-kernel, davem, kuznet, pekkas, jmorris, yoshfuji, kaber, paul.moore On Fri, 2010-11-12 at 17:35 +0100, David Lamparter wrote: > On Fri, Nov 12, 2010 at 05:15:32PM +0100, Eric Dumazet wrote: > > Le vendredi 12 novembre 2010 à 11:08 -0500, Eric Paris a écrit : > > > > > 2) What should the generic TCP code (tcp_connect()) do if the skb failed > > > to send. Should it return error codes back up the stack somehow or > > > should they continue to be ignored? Obviously continuing to just ignore > > > information we have doesn't make me happy (otherwise I wouldn't have > > > started scratching this itch). But the point about ENOBUFS is well > > > taken. Maybe I should make tcp_connect(), or the caller to > > > tcp_connect() more intelligent about specific error codes? > > > > > > I'm looking for a path forward. If SELinux is rejecting the SYN packets > > > on connect() I want to pass that info to userspace rather than just > > > hanging. What's the best way to accomplish that? > > > > > > > Eric, if you can differentiate a permanent reject, instead of a > > temporary one (congestion, or rate limiting, or ENOBUF, or ...), then > > yes, you could make tcp_connect() report to user the permanent error, > > and ignore the temporary one. > > If the netfilter targets DROP/REJECT match the NF_DROP/NF_REJECT > counterparts, which i guess they do but i didn't read the source ;), > then SELinux should use NF_REJECT in my opinion. As it stands today there is no NF_REJECT. NF_DROP is the only (related) permitted return value from a netfilter hook. Maybe I need to change that fact though. > NF_DROP does exactly what the name says, it drops the packet aka > basically puts it in /dev/null. As with writing to /dev/null, you don't > get an error for that. Even more, if in the meantime the DROP rule does > not match anymore, the 2nd or 3rd SYN from the connect() can come > through and establish a connection (think of "-m statistic" & co.) > > This is very different from REJECT. > > If REJECT doesn't immediately get reported to the application, that *is* > a bug, but last time i checked i got EPERM immediately. I would fix > SELinux to use the same mechanism. I haven't looked at what -j REJECT does (or was intended to do) but it most certainly does not return an error to sys_connect(). Try it out. iptables -A OUTPUT -p tcp --dport 80 -j REJECT links www.google.com it just hangs on 'making connection' (exact same for -j DROP) If everyone agrees that's the wrong behavior (for -j REJECT) I'll work on fixing that (however is appropriate) and will change the SELinux code if needed after we've fixed the -j REJECT code. Obviously there's problems with my original way to fix the lack of error returns (namely that I would immediately EACCES for DROP as well as REJECT). I'm glad to hear that others seem to believe the current code is buggy and I'm not completely off my rocker to think that applications should be able to learn somehow that things fell down... -Eric ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] network: return errors if we know tcp_connect failed 2010-11-12 16:35 ` David Lamparter 2010-11-12 16:53 ` Eric Paris @ 2010-11-12 16:54 ` Patrick McHardy 2010-11-12 17:57 ` a problem tcp_v4_err() Alexey Kuznetsov 2010-11-12 21:16 ` [RFC PATCH] network: return errors if we know tcp_connect failed David Lamparter 1 sibling, 2 replies; 31+ messages in thread From: Patrick McHardy @ 2010-11-12 16:54 UTC (permalink / raw) To: David Lamparter Cc: Eric Dumazet, Eric Paris, Hua Zhong, netdev, linux-kernel, davem, kuznet, pekkas, jmorris, yoshfuji, paul.moore Am 12.11.2010 17:35, schrieb David Lamparter: > On Fri, Nov 12, 2010 at 05:15:32PM +0100, Eric Dumazet wrote: >> Le vendredi 12 novembre 2010 à 11:08 -0500, Eric Paris a écrit : >> >>> 2) What should the generic TCP code (tcp_connect()) do if the skb failed >>> to send. Should it return error codes back up the stack somehow or >>> should they continue to be ignored? Obviously continuing to just ignore >>> information we have doesn't make me happy (otherwise I wouldn't have >>> started scratching this itch). But the point about ENOBUFS is well >>> taken. Maybe I should make tcp_connect(), or the caller to >>> tcp_connect() more intelligent about specific error codes? >>> >>> I'm looking for a path forward. If SELinux is rejecting the SYN packets >>> on connect() I want to pass that info to userspace rather than just >>> hanging. What's the best way to accomplish that? >>> >> >> Eric, if you can differentiate a permanent reject, instead of a >> temporary one (congestion, or rate limiting, or ENOBUF, or ...), then >> yes, you could make tcp_connect() report to user the permanent error, >> and ignore the temporary one. Indeed. We could even make the NF_DROP return value configurable by encoding it in the verdict. > If the netfilter targets DROP/REJECT match the NF_DROP/NF_REJECT > counterparts, which i guess they do but i didn't read the source ;), > then SELinux should use NF_REJECT in my opinion. There is no NF_REJECT. > NF_DROP does exactly what the name says, it drops the packet aka > basically puts it in /dev/null. As with writing to /dev/null, you don't > get an error for that. Even more, if in the meantime the DROP rule does > not match anymore, the 2nd or 3rd SYN from the connect() can come > through and establish a connection (think of "-m statistic" & co.) > > This is very different from REJECT. Returning NF_DROP results in -EPERM getting reported back. As Eric noticed, this is ignored for SYN packets. > If REJECT doesn't immediately get reported to the application, that *is* > a bug, but last time i checked i got EPERM immediately. I would fix > SELinux to use the same mechanism. NF_DROP returns -EPERM, the REJECT targets send packets to reject a connection. Whether this is reported immediately depends on the error and the protocol in question. Using a TCP reset immediately resets the connection. ^ permalink raw reply [flat|nested] 31+ messages in thread
* a problem tcp_v4_err() 2010-11-12 16:54 ` Patrick McHardy @ 2010-11-12 17:57 ` Alexey Kuznetsov 2010-11-12 18:12 ` Eric Dumazet 2010-11-12 21:16 ` [RFC PATCH] network: return errors if we know tcp_connect failed David Lamparter 1 sibling, 1 reply; 31+ messages in thread From: Alexey Kuznetsov @ 2010-11-12 17:57 UTC (permalink / raw) To: Patrick McHardy Cc: David Lamparter, Eric Dumazet, Eric Paris, Hua Zhong, netdev, linux-kernel, davem, pekkas, jmorris, yoshfuji, paul.moore Hello! I looked at tcp_v4_err() and found something strange. Quite non-trivial operations are performed on unlocked sockets. It looks like at least this BUG_ON(): skb = tcp_write_queue_head(sk); BUG_ON(!skb); can be easily triggered. Do I miss something? Alexey ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: a problem tcp_v4_err() 2010-11-12 17:57 ` a problem tcp_v4_err() Alexey Kuznetsov @ 2010-11-12 18:12 ` Eric Dumazet 2010-11-12 18:21 ` Eric Dumazet 2010-11-12 18:29 ` Alexey Kuznetsov 0 siblings, 2 replies; 31+ messages in thread From: Eric Dumazet @ 2010-11-12 18:12 UTC (permalink / raw) To: Alexey Kuznetsov Cc: Patrick McHardy, David Lamparter, Eric Paris, Hua Zhong, netdev, linux-kernel, davem, pekkas, jmorris, yoshfuji, paul.moore Le vendredi 12 novembre 2010 à 20:57 +0300, Alexey Kuznetsov a écrit : > Hello! > > I looked at tcp_v4_err() and found something strange. Quite non-trivial operations > are performed on unlocked sockets. It looks like at least this BUG_ON(): > > skb = tcp_write_queue_head(sk); > BUG_ON(!skb); > > can be easily triggered. > > Do I miss something? > Hi Alexey ! I see socket is locked around line 368, bh_lock_sock(sk); /* If too many ICMPs get dropped on busy * servers this needs to be solved differently. */ if (sock_owned_by_user(sk)) NET_INC_STATS_BH(net, LINUX_MIB_LOCKDROPPEDICMPS); Hmm, maybe some goto is missing ;) ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: a problem tcp_v4_err() 2010-11-12 18:12 ` Eric Dumazet @ 2010-11-12 18:21 ` Eric Dumazet 2010-11-12 18:27 ` Eric Dumazet 2010-11-12 18:29 ` Alexey Kuznetsov 1 sibling, 1 reply; 31+ messages in thread From: Eric Dumazet @ 2010-11-12 18:21 UTC (permalink / raw) To: Alexey Kuznetsov Cc: Patrick McHardy, David Lamparter, Eric Paris, Hua Zhong, netdev, linux-kernel, davem, pekkas, jmorris, yoshfuji, paul.moore Le vendredi 12 novembre 2010 à 19:12 +0100, Eric Dumazet a écrit : > Le vendredi 12 novembre 2010 à 20:57 +0300, Alexey Kuznetsov a écrit : > > Hello! > > > > I looked at tcp_v4_err() and found something strange. Quite non-trivial operations > > are performed on unlocked sockets. It looks like at least this BUG_ON(): > > > > skb = tcp_write_queue_head(sk); > > BUG_ON(!skb); > > > > can be easily triggered. > > > > Do I miss something? > > > > Hi Alexey ! > > I see socket is locked around line 368, > > bh_lock_sock(sk); > /* If too many ICMPs get dropped on busy > * servers this needs to be solved differently. > */ > if (sock_owned_by_user(sk)) > NET_INC_STATS_BH(net, LINUX_MIB_LOCKDROPPEDICMPS); > > > Hmm, maybe some goto is missing ;) > Well, goto is not missing. Why do you think BUG_ON(!skb) can be triggered ? We test before : if (seq != tp->snd_una || !icsk->icsk_retransmits || !icsk->icsk_backoff) break; So a concurrent user only can add new skb(s) in the (non empty) queue ? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: a problem tcp_v4_err() 2010-11-12 18:21 ` Eric Dumazet @ 2010-11-12 18:27 ` Eric Dumazet 2010-11-12 18:31 ` Alexey Kuznetsov 0 siblings, 1 reply; 31+ messages in thread From: Eric Dumazet @ 2010-11-12 18:27 UTC (permalink / raw) To: Alexey Kuznetsov Cc: Patrick McHardy, David Lamparter, Eric Paris, Hua Zhong, netdev, linux-kernel, davem, pekkas, jmorris, yoshfuji, paul.moore, Damian Lukowski Le vendredi 12 novembre 2010 à 19:21 +0100, Eric Dumazet a écrit : > Le vendredi 12 novembre 2010 à 19:12 +0100, Eric Dumazet a écrit : > > Le vendredi 12 novembre 2010 à 20:57 +0300, Alexey Kuznetsov a écrit : > > > Hello! > > > > > > I looked at tcp_v4_err() and found something strange. Quite non-trivial operations > > > are performed on unlocked sockets. It looks like at least this BUG_ON(): > > > > > > skb = tcp_write_queue_head(sk); > > > BUG_ON(!skb); > > > > > > can be easily triggered. > > > > > > Do I miss something? > > > > > > > Hi Alexey ! > > > > I see socket is locked around line 368, > > > > bh_lock_sock(sk); > > /* If too many ICMPs get dropped on busy > > * servers this needs to be solved differently. > > */ > > if (sock_owned_by_user(sk)) > > NET_INC_STATS_BH(net, LINUX_MIB_LOCKDROPPEDICMPS); > > > > > > Hmm, maybe some goto is missing ;) > > > > Well, goto is not missing. > > Why do you think BUG_ON(!skb) can be triggered ? > > We test before : > > if (seq != tp->snd_una || !icsk->icsk_retransmits || > !icsk->icsk_backoff) > break; > > So a concurrent user only can add new skb(s) in the (non empty) queue ? > > Oh well, it seems you are right (backlog processing) Bug was introduced in commit f1ecd5d9e736660 (Revert Backoff [v3]: Revert RTO on ICMP destination unreachable) from Damian Lukowski ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: a problem tcp_v4_err() 2010-11-12 18:27 ` Eric Dumazet @ 2010-11-12 18:31 ` Alexey Kuznetsov 0 siblings, 0 replies; 31+ messages in thread From: Alexey Kuznetsov @ 2010-11-12 18:31 UTC (permalink / raw) To: Eric Dumazet Cc: Patrick McHardy, David Lamparter, Eric Paris, Hua Zhong, netdev, linux-kernel, davem, pekkas, jmorris, yoshfuji, paul.moore, Damian Lukowski Hello! > Oh well, it seems you are right (backlog processing) Exactly. Alexey ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: a problem tcp_v4_err() 2010-11-12 18:12 ` Eric Dumazet 2010-11-12 18:21 ` Eric Dumazet @ 2010-11-12 18:29 ` Alexey Kuznetsov 2010-11-12 18:33 ` Eric Dumazet 1 sibling, 1 reply; 31+ messages in thread From: Alexey Kuznetsov @ 2010-11-12 18:29 UTC (permalink / raw) To: Eric Dumazet Cc: Patrick McHardy, David Lamparter, Eric Paris, Hua Zhong, netdev, linux-kernel, davem, pekkas, jmorris, yoshfuji, paul.moore Hello! On Fri, Nov 12, 2010 at 07:12:58PM +0100, Eric Dumazet wrote: > I see socket is locked around line 368, > > bh_lock_sock(sk); > /* If too many ICMPs get dropped on busy > * servers this needs to be solved differently. > */ > if (sock_owned_by_user(sk)) > NET_INC_STATS_BH(net, LINUX_MIB_LOCKDROPPEDICMPS); > > > Hmm, maybe some goto is missing ;) It is not missing, sock_owned_by_user() is checked later when some operation which cannot be done without lock is required. It was done to save error in sk_err_soft even when socket is locked. This code also _understands_ this: look at } else if (sock_owned_by_user(sk)) { /* RTO revert clocked out retransmission, * but socket is locked. Will defer. */ inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, HZ/20, TCP_RTO_MAX); but somehow it considers the manipulations with rto/backoff/write_queue as safe. Seems, they are not. Alexey ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: a problem tcp_v4_err() 2010-11-12 18:29 ` Alexey Kuznetsov @ 2010-11-12 18:33 ` Eric Dumazet 2010-11-12 19:22 ` David Miller 0 siblings, 1 reply; 31+ messages in thread From: Eric Dumazet @ 2010-11-12 18:33 UTC (permalink / raw) To: Alexey Kuznetsov Cc: Patrick McHardy, David Lamparter, Eric Paris, Hua Zhong, netdev, linux-kernel, davem, pekkas, jmorris, yoshfuji, paul.moore, Damian Lukowski Le vendredi 12 novembre 2010 à 21:29 +0300, Alexey Kuznetsov a écrit : > Hello! > > On Fri, Nov 12, 2010 at 07:12:58PM +0100, Eric Dumazet wrote: > > I see socket is locked around line 368, > > > > bh_lock_sock(sk); > > /* If too many ICMPs get dropped on busy > > * servers this needs to be solved differently. > > */ > > if (sock_owned_by_user(sk)) > > NET_INC_STATS_BH(net, LINUX_MIB_LOCKDROPPEDICMPS); > > > > > > Hmm, maybe some goto is missing ;) > > It is not missing, sock_owned_by_user() is checked later when some operation which > cannot be done without lock is required. It was done to save error in sk_err_soft even > when socket is locked. > > This code also _understands_ this: look at > > } else if (sock_owned_by_user(sk)) { > /* RTO revert clocked out retransmission, > * but socket is locked. Will defer. */ > inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, > HZ/20, TCP_RTO_MAX); > > but somehow it considers the manipulations with rto/backoff/write_queue as safe. > Seems, they are not. Indeed, right you are, I came to same conclusion. I CC Damian Lukowski in my previous answer (and this one too) Thanks ! ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: a problem tcp_v4_err() 2010-11-12 18:33 ` Eric Dumazet @ 2010-11-12 19:22 ` David Miller 2010-11-12 21:18 ` Eric Dumazet 0 siblings, 1 reply; 31+ messages in thread From: David Miller @ 2010-11-12 19:22 UTC (permalink / raw) To: eric.dumazet Cc: kuznet, kaber, equinox, eparis, hzhong, netdev, linux-kernel, pekkas, jmorris, yoshfuji, paul.moore, damian From: Eric Dumazet <eric.dumazet@gmail.com> Date: Fri, 12 Nov 2010 19:33:23 +0100 > I CC Damian Lukowski in my previous answer (and this one too) Probably the safest fix is this: diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 8f8527d..69ccbc1 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -415,6 +415,9 @@ void tcp_v4_err(struct sk_buff *icmp_skb, u32 info) !icsk->icsk_backoff) break; + if (sock_owned_by_user(sk)) + break; + icsk->icsk_backoff--; inet_csk(sk)->icsk_rto = __tcp_set_rto(tp) << icsk->icsk_backoff; @@ -429,11 +432,6 @@ void tcp_v4_err(struct sk_buff *icmp_skb, u32 info) if (remaining) { inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, remaining, TCP_RTO_MAX); - } else if (sock_owned_by_user(sk)) { - /* RTO revert clocked out retransmission, - * but socket is locked. Will defer. */ - inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, - HZ/20, TCP_RTO_MAX); } else { /* RTO revert clocked out retransmission. * Will retransmit now */ ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: a problem tcp_v4_err() 2010-11-12 19:22 ` David Miller @ 2010-11-12 21:18 ` Eric Dumazet 2010-11-12 21:36 ` David Miller 0 siblings, 1 reply; 31+ messages in thread From: Eric Dumazet @ 2010-11-12 21:18 UTC (permalink / raw) To: David Miller Cc: kuznet, kaber, equinox, eparis, hzhong, netdev, linux-kernel, pekkas, jmorris, yoshfuji, paul.moore, damian Le vendredi 12 novembre 2010 à 11:22 -0800, David Miller a écrit : > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Fri, 12 Nov 2010 19:33:23 +0100 > > > I CC Damian Lukowski in my previous answer (and this one too) > > Probably the safest fix is this: > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > index 8f8527d..69ccbc1 100644 > --- a/net/ipv4/tcp_ipv4.c > +++ b/net/ipv4/tcp_ipv4.c > @@ -415,6 +415,9 @@ void tcp_v4_err(struct sk_buff *icmp_skb, u32 info) > !icsk->icsk_backoff) > break; > > + if (sock_owned_by_user(sk)) > + break; > + > icsk->icsk_backoff--; > inet_csk(sk)->icsk_rto = __tcp_set_rto(tp) << > icsk->icsk_backoff; > @@ -429,11 +432,6 @@ void tcp_v4_err(struct sk_buff *icmp_skb, u32 info) > if (remaining) { > inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, > remaining, TCP_RTO_MAX); > - } else if (sock_owned_by_user(sk)) { > - /* RTO revert clocked out retransmission, > - * but socket is locked. Will defer. */ > - inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, > - HZ/20, TCP_RTO_MAX); > } else { > /* RTO revert clocked out retransmission. > * Will retransmit now */ Acked-by: Eric Dumazet <eric.dumazet@gmail.com> ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: a problem tcp_v4_err() 2010-11-12 21:18 ` Eric Dumazet @ 2010-11-12 21:36 ` David Miller 0 siblings, 0 replies; 31+ messages in thread From: David Miller @ 2010-11-12 21:36 UTC (permalink / raw) To: eric.dumazet Cc: kuznet, kaber, equinox, eparis, hzhong, netdev, linux-kernel, pekkas, jmorris, yoshfuji, paul.moore, damian From: Eric Dumazet <eric.dumazet@gmail.com> Date: Fri, 12 Nov 2010 22:18:38 +0100 > Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Thanks for reviewing, here is the final bit I committed and I'll queue this up for whatever -stable trees need this. -------------------- tcp: Don't change unlocked socket state in tcp_v4_err(). Alexey Kuznetsov noticed a regression introduced by commit f1ecd5d9e7366609d640ff4040304ea197fbc618 ("Revert Backoff [v3]: Revert RTO on ICMP destination unreachable") The RTO and timer modification code added to tcp_v4_err() doesn't check sock_owned_by_user(), which if true means we don't have exclusive access to the socket and therefore cannot modify it's critical state. Just skip this new code block if sock_owned_by_user() is true and eliminate the now superfluous sock_owned_by_user() code block contained within. Reported-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net> CC: Damian Lukowski <damian@tvk.rwth-aachen.de> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> --- net/ipv4/tcp_ipv4.c | 8 +++----- 1 files changed, 3 insertions(+), 5 deletions(-) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 8f8527d..69ccbc1 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -415,6 +415,9 @@ void tcp_v4_err(struct sk_buff *icmp_skb, u32 info) !icsk->icsk_backoff) break; + if (sock_owned_by_user(sk)) + break; + icsk->icsk_backoff--; inet_csk(sk)->icsk_rto = __tcp_set_rto(tp) << icsk->icsk_backoff; @@ -429,11 +432,6 @@ void tcp_v4_err(struct sk_buff *icmp_skb, u32 info) if (remaining) { inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, remaining, TCP_RTO_MAX); - } else if (sock_owned_by_user(sk)) { - /* RTO revert clocked out retransmission, - * but socket is locked. Will defer. */ - inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, - HZ/20, TCP_RTO_MAX); } else { /* RTO revert clocked out retransmission. * Will retransmit now */ -- 1.7.3.2 ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] network: return errors if we know tcp_connect failed 2010-11-12 16:54 ` Patrick McHardy 2010-11-12 17:57 ` a problem tcp_v4_err() Alexey Kuznetsov @ 2010-11-12 21:16 ` David Lamparter 2010-11-12 21:18 ` David Miller 1 sibling, 1 reply; 31+ messages in thread From: David Lamparter @ 2010-11-12 21:16 UTC (permalink / raw) To: Patrick McHardy Cc: David Lamparter, Eric Dumazet, Eric Paris, Hua Zhong, netdev, linux-kernel, davem, kuznet, pekkas, jmorris, yoshfuji, paul.moore On Fri, Nov 12, 2010 at 05:54:29PM +0100, Patrick McHardy wrote: > Am 12.11.2010 17:35, schrieb David Lamparter: > > On Fri, Nov 12, 2010 at 05:15:32PM +0100, Eric Dumazet wrote: > >> Le vendredi 12 novembre 2010 à 11:08 -0500, Eric Paris a écrit : > >> > >>> 2) What should the generic TCP code (tcp_connect()) do if the skb failed > >>> to send. Should it return error codes back up the stack somehow or > >>> should they continue to be ignored? Obviously continuing to just ignore > >>> information we have doesn't make me happy (otherwise I wouldn't have > >>> started scratching this itch). But the point about ENOBUFS is well > >>> taken. Maybe I should make tcp_connect(), or the caller to > >>> tcp_connect() more intelligent about specific error codes? > >>> > >>> I'm looking for a path forward. If SELinux is rejecting the SYN packets > >>> on connect() I want to pass that info to userspace rather than just > >>> hanging. What's the best way to accomplish that? > >>> > >> > >> Eric, if you can differentiate a permanent reject, instead of a > >> temporary one (congestion, or rate limiting, or ENOBUF, or ...), then > >> yes, you could make tcp_connect() report to user the permanent error, > >> and ignore the temporary one. > > Indeed. We could even make the NF_DROP return value configurable > by encoding it in the verdict. > There is no NF_REJECT. Ah, sorry, not at home in netfilter, coming from an user perspective here. > Returning NF_DROP results in -EPERM getting reported back. As Eric > noticed, this is ignored for SYN packets. Hrm. But how do you silently drop packets? This seems counterintuitive or even buggy to me; or at least the netfilter DROP target shouldn't use this kind of error-reporting drop. As food for thought I'd like to pose the following rule: iptables -A OUTPUT -m statistic --mode nth --every 5 -j DROP which should, to my understanding, still allow the connect to complete, even if the first SYN got (silently!...) dropped. Also, i'm *very* sure i was able to trigger a "permission denied" from either firewall or route rules; weirdly enough i can't get that on my 2.6.35.7 router... (poking older boxes to reproduce it right now) Just for reference some test results: (heavily cropped) TL;DR: only tcp-reset and route prohibit work immediately. + telnet 74.125.43.105 80 Connected to 74.125.43.105. + iptables -I OUTPUT -p tcp -d 74.125.43.105 --dport 80 -j REJECT # default w/o reject-with is icmp-port-unreachable + telnet 74.125.43.105 80 telnet: connect to address 74.125.43.105: Connection refused real 0m3.014s + iptables -I OUTPUT -p tcp -d 74.125.43.105 --dport 80 -j REJECT --reject-with tcp-reset + telnet 74.125.43.105 80 telnet: connect to address 74.125.43.105: Connection refused real 0m0.007s + iptables -I OUTPUT -p tcp -d 74.125.43.105 --dport 80 -j REJECT --reject-with host-prohib + telnet 74.125.43.105 80 telnet: connect to address 74.125.43.105: No route to host real 0m3.010s + iptables -I OUTPUT -p tcp -d 74.125.43.105 --dport 80 -j REJECT --reject-with admin-prohib + telnet 74.125.43.105 80 telnet: connect to address 74.125.43.105: No route to host real 0m3.009s + iptables -I OUTPUT -p tcp -d 74.125.43.105 --dport 80 -j REJECT --reject-with net-prohib + telnet 74.125.43.105 80 telnet: connect to address 74.125.43.105: Network is unreachable real 0m3.011s + iptables -F OUTPUT + ip route add prohibit 74.125.43.105 + ip route flush cache + telnet 74.125.43.105 80 telnet: connect to address 74.125.43.105: Network is unreachable real 0m0.007s -David ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] network: return errors if we know tcp_connect failed 2010-11-12 21:16 ` [RFC PATCH] network: return errors if we know tcp_connect failed David Lamparter @ 2010-11-12 21:18 ` David Miller 0 siblings, 0 replies; 31+ messages in thread From: David Miller @ 2010-11-12 21:18 UTC (permalink / raw) To: equinox Cc: kaber, eric.dumazet, eparis, hzhong, netdev, linux-kernel, kuznet, pekkas, jmorris, yoshfuji, paul.moore From: David Lamparter <equinox@diac24.net> Date: Fri, 12 Nov 2010 22:16:27 +0100 > As food for thought I'd like to pose the following rule: > iptables -A OUTPUT -m statistic --mode nth --every 5 -j DROP > which should, to my understanding, still allow the connect to complete, > even if the first SYN got (silently!...) dropped. Yes, I agree and this is pretty much the point I tried to make earlier. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] network: return errors if we know tcp_connect failed 2010-11-11 21:03 [RFC PATCH] network: return errors if we know tcp_connect failed Eric Paris 2010-11-11 21:14 ` Eric Dumazet 2010-11-11 21:58 ` Hua Zhong @ 2010-11-12 17:46 ` Alexey Kuznetsov 2010-11-12 19:28 ` David Miller 2 siblings, 1 reply; 31+ messages in thread From: Alexey Kuznetsov @ 2010-11-12 17:46 UTC (permalink / raw) To: Eric Paris; +Cc: netdev, linux-kernel, davem, pekkas, jmorris, yoshfuji, kaber Hello! On Thu, Nov 11, 2010 at 04:03:41PM -0500, Eric Paris wrote: > immediately when it calls connect(). Is this wrong? Is this bad to tell > userspace more quickly what happened? Does passing this error code back up > the stack here break something else? Why do some functions seem to pay > attention to tcp_transmit_skb() return codes and some functions just ignore > it? Essentially, return value of tcp_transmit_skb() is always ignored. It is used only for accounting and for some optimization of retransmission behaviour. Generally, tcp does not react on errors coming outside of tcp protocol. The only loophole is ICMP error in the same case as yours. In _violation_ of specs linux immediately aborts unestablished connect on an icmp error. IMHO that thing which you suggest is correct (of course, provided you filter out transient errors and react only to EPERM or something like this). It was not done because it was expected firewall rule prescribing immediate abort is configured with "--reject-with icmp-port-unreachable", otherwise the rule orders real blackhole. Alexey ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] network: return errors if we know tcp_connect failed 2010-11-12 17:46 ` Alexey Kuznetsov @ 2010-11-12 19:28 ` David Miller 0 siblings, 0 replies; 31+ messages in thread From: David Miller @ 2010-11-12 19:28 UTC (permalink / raw) To: kuznet; +Cc: eparis, netdev, linux-kernel, pekkas, jmorris, yoshfuji, kaber From: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Date: Fri, 12 Nov 2010 20:46:20 +0300 > The only loophole is ICMP error in the same case as yours. In > _violation_ of specs linux immediately aborts unestablished connect > on an icmp error. IMHO that thing which you suggest is correct (of > course, provided you filter out transient errors and react only to > EPERM or something like this). It was not done because it was > expected firewall rule prescribing immediate abort is configured > with "--reject-with icmp-port-unreachable", otherwise the rule > orders real blackhole. The idea to signal on -EPERM might be OK, but if that's also what things like "-m statistical" and friends end up reporting then we still cannot do it. ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2010-11-15 20:01 UTC | newest] Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-11-11 21:03 [RFC PATCH] network: return errors if we know tcp_connect failed Eric Paris 2010-11-11 21:14 ` Eric Dumazet 2010-11-11 21:58 ` Hua Zhong 2010-11-12 7:36 ` Patrick McHardy 2010-11-12 23:14 ` Hua Zhong 2010-11-15 10:32 ` Patrick McHardy 2010-11-15 15:47 ` Eric Paris 2010-11-15 15:57 ` Patrick McHardy 2010-11-15 16:04 ` Patrick McHardy 2010-11-15 16:36 ` Patrick McHardy 2010-11-15 16:46 ` David Miller 2010-11-15 20:00 ` Alexey Kuznetsov 2010-11-12 16:08 ` Eric Paris 2010-11-12 16:15 ` Eric Dumazet 2010-11-12 16:35 ` David Lamparter 2010-11-12 16:53 ` Eric Paris 2010-11-12 16:54 ` Patrick McHardy 2010-11-12 17:57 ` a problem tcp_v4_err() Alexey Kuznetsov 2010-11-12 18:12 ` Eric Dumazet 2010-11-12 18:21 ` Eric Dumazet 2010-11-12 18:27 ` Eric Dumazet 2010-11-12 18:31 ` Alexey Kuznetsov 2010-11-12 18:29 ` Alexey Kuznetsov 2010-11-12 18:33 ` Eric Dumazet 2010-11-12 19:22 ` David Miller 2010-11-12 21:18 ` Eric Dumazet 2010-11-12 21:36 ` David Miller 2010-11-12 21:16 ` [RFC PATCH] network: return errors if we know tcp_connect failed David Lamparter 2010-11-12 21:18 ` David Miller 2010-11-12 17:46 ` Alexey Kuznetsov 2010-11-12 19:28 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).