All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Mohandass, Roobesh" <Roobesh_Mohandass@McAfee.com>
To: Willy Tarreau <w@1wt.eu>, Lukas Tribus <lists@ltri.eu>
Cc: Florian Westphal <fw@strlen.de>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: RE: [NETDEV]: getsockopt(fd, SOL_IP, SO_ORIGINAL_DST, sa, &salen) is in fact sometimes returning the source IP instead the destination IP
Date: Thu, 17 Jan 2019 05:23:59 +0000	[thread overview]
Message-ID: <MWHPR16MB1502F0C1FAD55777E7B228BFED830@MWHPR16MB1502.namprd16.prod.outlook.com> (raw)
In-Reply-To: <20190112183344.GA26847@1wt.eu>

[-- Attachment #1: Type: text/plain, Size: 3113 bytes --]

Hi Willy/Florian/Lukas,

Thanks for your help around this concern, sorry for the delayed response. I will test this out and get back to you.

-Roobesh G M

-----Original Message-----
From: Willy Tarreau <w@1wt.eu> 
Sent: Sunday, January 13, 2019 12:04 AM
To: Lukas Tribus <lists@ltri.eu>
Cc: Mohandass, Roobesh <Roobesh_Mohandass@McAfee.com>; Florian Westphal <fw@strlen.de>; netdev@vger.kernel.org
Subject: Re: [NETDEV]: getsockopt(fd, SOL_IP, SO_ORIGINAL_DST, sa, &salen) is in fact sometimes returning the source IP instead the destination IP

This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Hi Lukas,

On Sat, Jan 12, 2019 at 07:01:34PM +0100, Lukas Tribus wrote:
> > Roobesh, do you use the destination address only for logging or 
> > anywhere else in the request path ? And could you check if you have 
> > nf_conntrack_tcp_loose set as Florian suggests ? I really think he 
> > figured it right.
> 
> It's about what we send with the PROXY protocol to the backend server, 
> Roobesh reported things like that (src and dst is the same):
> 
> PROXY TCP4 192.220.26.39 192.220.26.39 45066 45066 PROXY TCP4 
> 192.220.26.39 192.220.26.39 45075 45075
> 
> So the call would actually happen at the beginning of the TCP connection.

That sounds quite shocking to me then. Maybe we're facing such a sequence:

 1) first session comes from this port, and client closes first (FIN)
 2) haproxy then closes with (FIN)
 3) client doesn't respond with the final ACK (e.g. blocked by another
    firewall in between or the client's own conntrack)
 4) the socket on the haproxy side remains in LAST_ACK state and ACKs
    are periodically resent
 5) local conntrack is in TIME_WAIT and expires faster than the largest
    interval between two such ACKs
 6) one of these retransmitted ACKs reopens the connection in reverse
    direction due to nf_conntrack_tcp_loose. The connection is then
    seen in ESTABLISHED state and might be kept a bit longer.
 8) the connection finally expires in the local TCP stack but not yet
    in conntrack.
 7) later the client reuses the same source port while the connection
    is still present in the conntrack table.
 8) assuming tcp_be_liberal is also set, the packets can pass through
    conntrack and establish a new connection to haproxy.
 9) haproxy calls getsockopt(SO_ORIGINAL_DST) and gets the other end
    point since the connection was created at step 6 above in the
    other direction.

I could be wrong on certain specific points above but it looks plausible. 

> Initial report is here:
> https://discourse.haproxy.org/t/send-proxy-not-modifying-some-traffic-
> with-proxy-ip-port-details/3336

Ah cool, I didn't have all this, thank you!

> Let's see if disabling nf_conntrack_tcp_loose changes things.

Yes this really is the only thing I can think of, and in this case noone is wrong in this chain (neither kernel nor haproxy). We'll need to document it in my opinion.

Thanks,
Willy

[-- Attachment #2: Type: message/rfc822, Size: 11046 bytes --]

From: Willy Tarreau <w@1wt.eu>
To: Florian Westphal <fw@strlen.de>
Cc: Lukas Tribus <lists@ltri.eu>, "Mohandass, Roobesh" <Roobesh_Mohandass@McAfee.com>, "netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: [NETDEV]: getsockopt(fd, SOL_IP, SO_ORIGINAL_DST, sa, &salen) is in fact sometimes returning the source IP instead the destination IP
Date: Sat, 12 Jan 2019 17:26:36 +0000
Message-ID: <20190112172636.GA26639@1wt.eu>

This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Hi Florian!

Sorry for the lag, I was busy with other issues.

On Sat, Jan 12, 2019 at 05:04:00PM +0100, Florian Westphal wrote:
> Lukas Tribus <lists@ltri.eu> wrote:
> > The application (haproxy) needs to know the original destination IP
> > address, however it does not know whether -j REDIRECT was used or not.
> > Because of this the application always queries SO_ORIGINAL_DST, and
> > this includes configurations without -j REDIRECT.
> >
> > Are you saying the behavior of SO_ORIGINAL_DST is undefined when not
> > used with -j REDIRECT and that this issue does not happen when -j
> > REDIRECT is actually used?
>
> No, thats not what I said.  Because OP provided a link that mentions
> TPROXY, I concluded OP was using TPROXY, so I pointed out that the
> error source can be completely avoided by not using SO_ORIGINAL_DST.
>
> As I said, SO_ORIGINAL_DST returns the dst address of
> the original direction *as seen by conntrack*.
>
> In case REDIRECT or DNAT was used, the address returned is the on-wire
> one, before DNAT rewrite took place.
>
> Therefore, SO_ORIGINAL_DST is only needed when REDIRECT or DNAT was
> used. If no DNAT rewrite takes place, sockaddr returned by accept or
> getsockname can be used directly and SO_ORIGINAL_DST isn't needed.
> The returned address should be identical to the one given by accept().

That's also the way we're using it :

     http://git.haproxy.org/?p=haproxy.git;a=blob;f=src/proto_tcp.c;h=52a4691d2ba93aec93a3e8a0edd1e90d93de968d;hb=HEAD#l600

More precisely if getsockopt() fails we only use getsockname(). It
happens that this code is very old (2002) and used to work with kernel
2.4's TPROXY which did rely on conntrack, hence the ifdef including
TPROXY. But it's irrelevant here, with a modern TPROXY the getsockopt()
here will usually fail, and the result will come from getsockname()
only.

When seeing the (old) haproxy code there, I've been thinking about
another possibility. Right now haproxy does getsockname() then *tries*
getsockopt(). In the unlikely case getsockopt() would modify part of
the address already returned by getsockname(), it could leave an
incorrect address there. But such an issue would require partial uses
of copy_to_user() which doesn't make much sense, and getorigdst()
doesn't do this either. So this one was ruled out.

I've been wondering if it was possible that from time to time, this
getsockopt() accidently succeeds and returns something wrong, maybe
due to a race with another thread, or maybe because it would return
an uninitialized field which happened to see the source address
previously. The very low amount of occurrence makes me think it could
possibly happen only upon certain error conditions. But looking at
getorigdst(), I hardly imagine how this would happen.

> If SO_ORIGINAL_DST returns the reply, then conntrack picked up
> a reply packet as the first packet of the connection, so it believes
> originator is the responder and vice versa.
>
> One case where this can happen is when nf_conntrack_tcp_loose
> (mid-stream pickup) is enabled.

Very interesting case indeed, I hadn't thought about it! I think we
don't have enough info from the original reporter's setup but it
would definitely make sense and explain why it's the other end which
is retrieved!

I'm seeing one possibility to explain this : Let's say the OP's setup
has a short conntrack timeout and a longer haproxy timeout. If the
address is only retrieved for logging, it will be retrieved at the
end of the connection. Let's assume haproxy receives a request from
a client, a conntrack entry is created and haproxy forwards the request
to a very slow server. Before the server responds, the conntrack entry
expires, then the server responds and haproxy forwards to the client,
re-creating the entry and hitting this case before the address is
picked up for logging.

Roobesh, do you use the destination address only for logging or
anywhere else in the request path ? And could you check if you have
nf_conntrack_tcp_loose set as Florian suggests ? I really think he
figured it right.

If that's the problem, I think we can address it with documentation :
first, warn about the use case, second, explain how to store the
address into a variable once the connection is accepted since it will
not have expired and will still be valid at this moment.

> This is not a haproxy bug.
>
> Only thing that haproxy could is to provide a knob to make it only
> use addresses returned by accept, rather than relying on
> SO_ORIGINAL_DST for those that use TPROXY to do MITM interception.

Since it's the destination we get it using getsockname(), not accept(),
but yeah maybe we'd benefit from having a tunable to disable the use of
getsockopt(SO_ORIGINAL_DST). Actually I'd prefer to avoid doing this
until the issue it confirmed because I don't feel comfortable with having
an option saying "disable the use of getsockopt(SO_ORIGINAL_DST) which
may fail once per million due to some obscure reasons". However if it's
confirmed that it indeed is the scenario above that happens, we could
possibly think about adding a setting to work around if the doc approach
is not enough.

Many thanks for your insights, these are exactly the reason I asked
Roobesh to bring the issue here :-)

Cheers,
Willy

[-- Attachment #3: Type: message/rfc822, Size: 7531 bytes --]

From: Florian Westphal <fw@strlen.de>
To: Lukas Tribus <lists@ltri.eu>
Cc: Florian Westphal <fw@strlen.de>, "Mohandass, Roobesh" <Roobesh_Mohandass@McAfee.com>, Willy Tarreau <w@1wt.eu>, "netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: [NETDEV]: getsockopt(fd, SOL_IP, SO_ORIGINAL_DST, sa, &salen) is in fact sometimes returning the source IP instead the destination IP
Date: Sat, 12 Jan 2019 16:04:00 +0000
Message-ID: <20190112160400.dblitzk2ftlfzryd@breakpoint.cc>

This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Lukas Tribus <lists@ltri.eu> wrote:
> The application (haproxy) needs to know the original destination IP
> address, however it does not know whether -j REDIRECT was used or not.
> Because of this the application always queries SO_ORIGINAL_DST, and
> this includes configurations without -j REDIRECT.
>
> Are you saying the behavior of SO_ORIGINAL_DST is undefined when not
> used with -j REDIRECT and that this issue does not happen when -j
> REDIRECT is actually used?

No, thats not what I said.  Because OP provided a link that mentions
TPROXY, I concluded OP was using TPROXY, so I pointed out that the
error source can be completely avoided by not using SO_ORIGINAL_DST.

As I said, SO_ORIGINAL_DST returns the dst address of
the original direction *as seen by conntrack*.

In case REDIRECT or DNAT was used, the address returned is the on-wire
one, before DNAT rewrite took place.

Therefore, SO_ORIGINAL_DST is only needed when REDIRECT or DNAT was
used. If no DNAT rewrite takes place, sockaddr returned by accept or
getsockname can be used directly and SO_ORIGINAL_DST isn't needed.
The returned address should be identical to the one given by accept().

If SO_ORIGINAL_DST returns the reply, then conntrack picked up
a reply packet as the first packet of the connection, so it believes
originator is the responder and vice versa.

One case where this can happen is when nf_conntrack_tcp_loose
(mid-stream pickup) is enabled.

This is not a haproxy bug.

Only thing that haproxy could is to provide a knob to make it only
use addresses returned by accept, rather than relying on
SO_ORIGINAL_DST for those that use TPROXY to do MITM interception.

[-- Attachment #4: Type: message/rfc822, Size: 7573 bytes --]

From: Lukas Tribus <lists@ltri.eu>
To: Florian Westphal <fw@strlen.de>
Cc: "Mohandass, Roobesh" <Roobesh_Mohandass@McAfee.com>, Willy Tarreau <w@1wt.eu>, "netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: [NETDEV]: getsockopt(fd, SOL_IP, SO_ORIGINAL_DST, sa, &salen) is in fact sometimes returning the source IP instead the destination IP
Date: Sat, 12 Jan 2019 14:37:11 +0000
Message-ID: <2e83651c-df8b-8341-4170-df328e3d756a@ltri.eu>

This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Hello,


>> getsockopt(fd, SOL_IP, SO_ORIGINAL_DST, sa, &salen) is in fact sometimes
>> returning the source IP instead the destination IP. Using getsockname()
>> instead looks like solving the issue.
>>
>> For just an example:
>> Out of 6569124 requests , 4 requests were wrong 0.000060891 % (this is just
>> an rough estimate to give you idea on frequency)
>>
>> Some old reference: (similar behavior observed)
>> stackoverflow-link (see earlier)
>
> I ask exactly same question as in your url.
> If you use TPROXY, why do you use SO_ORIGINAL_DST?
>
> Its only required with -j REDIRECT.

TPROXY is not used in the OP's scenario. The stackoverflow question was
an example.

The application (haproxy) needs to know the original destination IP
address, however it does not know whether -j REDIRECT was used or not.
Because of this the application always queries SO_ORIGINAL_DST, and
this includes configurations without -j REDIRECT.

Are you saying the behavior of SO_ORIGINAL_DST is undefined when not
used with -j REDIRECT and that this issue does not happen when -j
REDIRECT is actually used?

That would mean the application would have to expose a new configuration
knob to the user, and require the user to toggle the knob based on
-j REDIRECT usage.



Thanks,
Lukas


[-- Attachment #5: Type: message/rfc822, Size: 8642 bytes --]

From: Lukas Tribus <lists@ltri.eu>
To: Willy Tarreau <w@1wt.eu>, "Mohandass, Roobesh" <Roobesh_Mohandass@McAfee.com>
Cc: Florian Westphal <fw@strlen.de>, "netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: [NETDEV]: getsockopt(fd, SOL_IP, SO_ORIGINAL_DST, sa, &salen) is in fact sometimes returning the source IP instead the destination IP
Date: Sat, 12 Jan 2019 18:01:34 +0000
Message-ID: <CACC_My-0bsFPqSaNCsF2bZhNSo_3o1F+qfy8utKRO7-2CZZqYg@mail.gmail.com>

This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Hello!


On Sat, 12 Jan 2019 at 18:26, Willy Tarreau <w@1wt.eu> wrote:
> > One case where this can happen is when nf_conntrack_tcp_loose
> > (mid-stream pickup) is enabled.
>
> Very interesting case indeed, I hadn't thought about it! I think we
> don't have enough info from the original reporter's setup but it
> would definitely make sense and explain why it's the other end which
> is retrieved!
>
> I'm seeing one possibility to explain this : Let's say the OP's setup
> has a short conntrack timeout and a longer haproxy timeout. If the
> address is only retrieved for logging, it will be retrieved at the
> end of the connection. Let's assume haproxy receives a request from
> a client, a conntrack entry is created and haproxy forwards the request
> to a very slow server. Before the server responds, the conntrack entry
> expires, then the server responds and haproxy forwards to the client,
> re-creating the entry and hitting this case before the address is
> picked up for logging.
>
> Roobesh, do you use the destination address only for logging or
> anywhere else in the request path ? And could you check if you have
> nf_conntrack_tcp_loose set as Florian suggests ? I really think he
> figured it right.

It's about what we send with the PROXY protocol to the backend server,
Roobesh reported things like that (src and dst is the same):

PROXY TCP4 192.220.26.39 192.220.26.39 45066 45066
PROXY TCP4 192.220.26.39 192.220.26.39 45075 45075

So the call would actually happen at the beginning of the TCP connection.

Initial report is here:
https://discourse.haproxy.org/t/send-proxy-not-modifying-some-traffic-with-proxy-ip-port-details/3336


Let's see if disabling nf_conntrack_tcp_loose changes things.



Thanks,
Lukas

  reply	other threads:[~2019-01-17  5:36 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <MWHPR16MB1502549F7BAAE102EF1D5DF4EDB50@MWHPR16MB1502.namprd16.prod.outlook.com>
     [not found] ` <MWHPR16MB150233EF69CFD4562BFA65ABEDB60@MWHPR16MB1502.namprd16.prod.outlook.com>
     [not found]   ` <MWHPR16MB1502DC55542EBA8121631B16EDB20@MWHPR16MB1502.namprd16.prod.outlook.com>
2018-12-31  6:20     ` : getsockopt(fd, SOL_IP, SO_ORIGINAL_DST, sa, &salen) is in fact sometimes returning the source IP instead the destination IP Mohandass, Roobesh
2019-01-07 11:17       ` Florian Westphal
2019-01-07 14:57         ` Mohandass, Roobesh
2019-01-12 14:37         ` Lukas Tribus
2019-01-12 14:37           ` [NETDEV]: " Lukas Tribus
2019-01-12 16:04           ` : " Florian Westphal
2019-01-12 16:04             ` [NETDEV]: " Florian Westphal
2019-01-12 17:26             ` : " Willy Tarreau
2019-01-12 17:26               ` [NETDEV]: " Willy Tarreau
2019-01-12 18:01               ` : " Lukas Tribus
2019-01-12 18:01                 ` [NETDEV]: " Lukas Tribus
2019-01-12 18:33                 ` : " Willy Tarreau
2019-01-12 18:33                   ` [NETDEV]: " Willy Tarreau
2019-01-17  5:23                   ` Mohandass, Roobesh [this message]
2019-01-23  8:07                     ` Mohandass, Roobesh
2019-01-23  8:54                       ` Willy Tarreau
     [not found]     ` <MWHPR16MB1502A336CCB4EF3F0FCB69ECEDB20@MWHPR16MB1502.namprd16.prod.outlook.com>
2019-01-07  7:58       ` : " Mohandass, Roobesh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MWHPR16MB1502F0C1FAD55777E7B228BFED830@MWHPR16MB1502.namprd16.prod.outlook.com \
    --to=roobesh_mohandass@mcafee.com \
    --cc=fw@strlen.de \
    --cc=lists@ltri.eu \
    --cc=netdev@vger.kernel.org \
    --cc=w@1wt.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.