From mboxrd@z Thu Jan 1 00:00:00 1970 From: Willy Tarreau Subject: Re: : getsockopt(fd, SOL_IP, SO_ORIGINAL_DST, sa, &salen) is in fact sometimes returning the source IP instead the destination IP Date: Sat, 12 Jan 2019 18:26:36 +0100 Message-ID: <20190112172636.GA26639@1wt.eu> References: <20190107111753.aiabujukqx3eteqb@breakpoint.cc> <2e83651c-df8b-8341-4170-df328e3d756a@ltri.eu> <20190112160400.dblitzk2ftlfzryd@breakpoint.cc> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Lukas Tribus , Mohandass Roobesh , netdev@vger.kernel.org To: Florian Westphal Return-path: Received: from wtarreau.pck.nerim.net ([62.212.114.60]:34332 "EHLO 1wt.eu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725842AbfALR0q (ORCPT ); Sat, 12 Jan 2019 12:26:46 -0500 Content-Disposition: inline In-Reply-To: <20190112160400.dblitzk2ftlfzryd@breakpoint.cc> Sender: netdev-owner@vger.kernel.org List-ID: Hi Florian! Sorry for the lag, I was busy with other issues. On Sat, Jan 12, 2019 at 05:04:00PM +0100, Florian Westphal wrote: > Lukas Tribus wrote: > > The application (haproxy) needs to know the original destination IP > > address, however it does not know whether -j REDIRECT was used or not. > > Because of this the application always queries SO_ORIGINAL_DST, and > > this includes configurations without -j REDIRECT. > > > > Are you saying the behavior of SO_ORIGINAL_DST is undefined when not > > used with -j REDIRECT and that this issue does not happen when -j > > REDIRECT is actually used? > > No, thats not what I said. Because OP provided a link that mentions > TPROXY, I concluded OP was using TPROXY, so I pointed out that the > error source can be completely avoided by not using SO_ORIGINAL_DST. > > As I said, SO_ORIGINAL_DST returns the dst address of > the original direction *as seen by conntrack*. > > In case REDIRECT or DNAT was used, the address returned is the on-wire > one, before DNAT rewrite took place. > > Therefore, SO_ORIGINAL_DST is only needed when REDIRECT or DNAT was > used. If no DNAT rewrite takes place, sockaddr returned by accept or > getsockname can be used directly and SO_ORIGINAL_DST isn't needed. > The returned address should be identical to the one given by accept(). That's also the way we're using it : http://git.haproxy.org/?p=haproxy.git;a=blob;f=src/proto_tcp.c;h=52a4691d2ba93aec93a3e8a0edd1e90d93de968d;hb=HEAD#l600 More precisely if getsockopt() fails we only use getsockname(). It happens that this code is very old (2002) and used to work with kernel 2.4's TPROXY which did rely on conntrack, hence the ifdef including TPROXY. But it's irrelevant here, with a modern TPROXY the getsockopt() here will usually fail, and the result will come from getsockname() only. When seeing the (old) haproxy code there, I've been thinking about another possibility. Right now haproxy does getsockname() then *tries* getsockopt(). In the unlikely case getsockopt() would modify part of the address already returned by getsockname(), it could leave an incorrect address there. But such an issue would require partial uses of copy_to_user() which doesn't make much sense, and getorigdst() doesn't do this either. So this one was ruled out. I've been wondering if it was possible that from time to time, this getsockopt() accidently succeeds and returns something wrong, maybe due to a race with another thread, or maybe because it would return an uninitialized field which happened to see the source address previously. The very low amount of occurrence makes me think it could possibly happen only upon certain error conditions. But looking at getorigdst(), I hardly imagine how this would happen. > If SO_ORIGINAL_DST returns the reply, then conntrack picked up > a reply packet as the first packet of the connection, so it believes > originator is the responder and vice versa. > > One case where this can happen is when nf_conntrack_tcp_loose > (mid-stream pickup) is enabled. Very interesting case indeed, I hadn't thought about it! I think we don't have enough info from the original reporter's setup but it would definitely make sense and explain why it's the other end which is retrieved! I'm seeing one possibility to explain this : Let's say the OP's setup has a short conntrack timeout and a longer haproxy timeout. If the address is only retrieved for logging, it will be retrieved at the end of the connection. Let's assume haproxy receives a request from a client, a conntrack entry is created and haproxy forwards the request to a very slow server. Before the server responds, the conntrack entry expires, then the server responds and haproxy forwards to the client, re-creating the entry and hitting this case before the address is picked up for logging. Roobesh, do you use the destination address only for logging or anywhere else in the request path ? And could you check if you have nf_conntrack_tcp_loose set as Florian suggests ? I really think he figured it right. If that's the problem, I think we can address it with documentation : first, warn about the use case, second, explain how to store the address into a variable once the connection is accepted since it will not have expired and will still be valid at this moment. > This is not a haproxy bug. > > Only thing that haproxy could is to provide a knob to make it only > use addresses returned by accept, rather than relying on > SO_ORIGINAL_DST for those that use TPROXY to do MITM interception. Since it's the destination we get it using getsockname(), not accept(), but yeah maybe we'd benefit from having a tunable to disable the use of getsockopt(SO_ORIGINAL_DST). Actually I'd prefer to avoid doing this until the issue it confirmed because I don't feel comfortable with having an option saying "disable the use of getsockopt(SO_ORIGINAL_DST) which may fail once per million due to some obscure reasons". However if it's confirmed that it indeed is the scenario above that happens, we could possibly think about adding a setting to work around if the doc approach is not enough. Many thanks for your insights, these are exactly the reason I asked Roobesh to bring the issue here :-) Cheers, Willy From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1573C43387 for ; Sat, 12 Jan 2019 17:26:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id BE2E72084E for ; Sat, 12 Jan 2019 17:26:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726436AbfALR0q (ORCPT ); Sat, 12 Jan 2019 12:26:46 -0500 Received: from wtarreau.pck.nerim.net ([62.212.114.60]:34332 "EHLO 1wt.eu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725842AbfALR0q (ORCPT ); Sat, 12 Jan 2019 12:26:46 -0500 Received: (from willy@localhost) by pcw.home.local (8.15.2/8.15.2/Submit) id x0CHQaC4026704; Sat, 12 Jan 2019 18:26:36 +0100 Date: Sat, 12 Jan 2019 18:26:36 +0100 From: Willy Tarreau To: Florian Westphal Cc: Lukas Tribus , Mohandass Roobesh , netdev@vger.kernel.org Subject: Re: [NETDEV]: getsockopt(fd, SOL_IP, SO_ORIGINAL_DST, sa, &salen) is in fact sometimes returning the source IP instead the destination IP Message-ID: <20190112172636.GA26639@1wt.eu> References: <20190107111753.aiabujukqx3eteqb@breakpoint.cc> <2e83651c-df8b-8341-4170-df328e3d756a@ltri.eu> <20190112160400.dblitzk2ftlfzryd@breakpoint.cc> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190112160400.dblitzk2ftlfzryd@breakpoint.cc> User-Agent: Mutt/1.6.1 (2016-04-27) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Message-ID: <20190112172636.JbmEw-sVbXboYQTRGxCodGkttRj1wX0mZtUV8nlo-XI@z> Hi Florian! Sorry for the lag, I was busy with other issues. On Sat, Jan 12, 2019 at 05:04:00PM +0100, Florian Westphal wrote: > Lukas Tribus wrote: > > The application (haproxy) needs to know the original destination IP > > address, however it does not know whether -j REDIRECT was used or not. > > Because of this the application always queries SO_ORIGINAL_DST, and > > this includes configurations without -j REDIRECT. > > > > Are you saying the behavior of SO_ORIGINAL_DST is undefined when not > > used with -j REDIRECT and that this issue does not happen when -j > > REDIRECT is actually used? > > No, thats not what I said. Because OP provided a link that mentions > TPROXY, I concluded OP was using TPROXY, so I pointed out that the > error source can be completely avoided by not using SO_ORIGINAL_DST. > > As I said, SO_ORIGINAL_DST returns the dst address of > the original direction *as seen by conntrack*. > > In case REDIRECT or DNAT was used, the address returned is the on-wire > one, before DNAT rewrite took place. > > Therefore, SO_ORIGINAL_DST is only needed when REDIRECT or DNAT was > used. If no DNAT rewrite takes place, sockaddr returned by accept or > getsockname can be used directly and SO_ORIGINAL_DST isn't needed. > The returned address should be identical to the one given by accept(). That's also the way we're using it : http://git.haproxy.org/?p=haproxy.git;a=blob;f=src/proto_tcp.c;h=52a4691d2ba93aec93a3e8a0edd1e90d93de968d;hb=HEAD#l600 More precisely if getsockopt() fails we only use getsockname(). It happens that this code is very old (2002) and used to work with kernel 2.4's TPROXY which did rely on conntrack, hence the ifdef including TPROXY. But it's irrelevant here, with a modern TPROXY the getsockopt() here will usually fail, and the result will come from getsockname() only. When seeing the (old) haproxy code there, I've been thinking about another possibility. Right now haproxy does getsockname() then *tries* getsockopt(). In the unlikely case getsockopt() would modify part of the address already returned by getsockname(), it could leave an incorrect address there. But such an issue would require partial uses of copy_to_user() which doesn't make much sense, and getorigdst() doesn't do this either. So this one was ruled out. I've been wondering if it was possible that from time to time, this getsockopt() accidently succeeds and returns something wrong, maybe due to a race with another thread, or maybe because it would return an uninitialized field which happened to see the source address previously. The very low amount of occurrence makes me think it could possibly happen only upon certain error conditions. But looking at getorigdst(), I hardly imagine how this would happen. > If SO_ORIGINAL_DST returns the reply, then conntrack picked up > a reply packet as the first packet of the connection, so it believes > originator is the responder and vice versa. > > One case where this can happen is when nf_conntrack_tcp_loose > (mid-stream pickup) is enabled. Very interesting case indeed, I hadn't thought about it! I think we don't have enough info from the original reporter's setup but it would definitely make sense and explain why it's the other end which is retrieved! I'm seeing one possibility to explain this : Let's say the OP's setup has a short conntrack timeout and a longer haproxy timeout. If the address is only retrieved for logging, it will be retrieved at the end of the connection. Let's assume haproxy receives a request from a client, a conntrack entry is created and haproxy forwards the request to a very slow server. Before the server responds, the conntrack entry expires, then the server responds and haproxy forwards to the client, re-creating the entry and hitting this case before the address is picked up for logging. Roobesh, do you use the destination address only for logging or anywhere else in the request path ? And could you check if you have nf_conntrack_tcp_loose set as Florian suggests ? I really think he figured it right. If that's the problem, I think we can address it with documentation : first, warn about the use case, second, explain how to store the address into a variable once the connection is accepted since it will not have expired and will still be valid at this moment. > This is not a haproxy bug. > > Only thing that haproxy could is to provide a knob to make it only > use addresses returned by accept, rather than relying on > SO_ORIGINAL_DST for those that use TPROXY to do MITM interception. Since it's the destination we get it using getsockname(), not accept(), but yeah maybe we'd benefit from having a tunable to disable the use of getsockopt(SO_ORIGINAL_DST). Actually I'd prefer to avoid doing this until the issue it confirmed because I don't feel comfortable with having an option saying "disable the use of getsockopt(SO_ORIGINAL_DST) which may fail once per million due to some obscure reasons". However if it's confirmed that it indeed is the scenario above that happens, we could possibly think about adding a setting to work around if the doc approach is not enough. Many thanks for your insights, these are exactly the reason I asked Roobesh to bring the issue here :-) Cheers, Willy