All of lore.kernel.org
 help / color / mirror / Atom feed
* MSG_ZEROCOPY doesn't work on half-open TCP sockets
@ 2019-01-09 12:01 Marek Majkowski
  2019-01-09 12:48 ` Marek Majkowski
  0 siblings, 1 reply; 4+ messages in thread
From: Marek Majkowski @ 2019-01-09 12:01 UTC (permalink / raw)
  To: Willem de Bruijn, netdev

Hi,

Current implementation of MSG_ZEROCOPY for TCP requires the socket to
be ESTABLISHED:
https://elixir.bootlin.com/linux/v5.0-rc1/source/net/ipv4/tcp.c#L1188

if (sk->sk_state != TCP_ESTABLISHED) {
    err = -EINVAL;
    goto out_err;
}

In TCP it's totally fine to have half-open sockets, for example:

shutdown(5, SHUT_RD)

Moves the socket from ESTABLISHED to CLOSE_WAIT. In such TCP state
it's possible to continue sending data. This is not supported by
MSG_ZEROCOPY, which will fail with EINVAL in such case. I think it's a
bug.

Marek

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: MSG_ZEROCOPY doesn't work on half-open TCP sockets
  2019-01-09 12:01 MSG_ZEROCOPY doesn't work on half-open TCP sockets Marek Majkowski
@ 2019-01-09 12:48 ` Marek Majkowski
  2019-01-09 13:55   ` Willem de Bruijn
  0 siblings, 1 reply; 4+ messages in thread
From: Marek Majkowski @ 2019-01-09 12:48 UTC (permalink / raw)
  To: Willem de Bruijn, netdev

I got it slightly wrong, and it's even worse than this. As far as I
understand it, the current semantics of MSG_ZEROCOPY on TCP make it
close to unusable. The problem is that the remote party can move your
MSG_ZEROCOPY socket from ESTABLISHED to CLOSE_WAIT without your
involvement. This will mean that even though the program can still
send() data to the socket, MSG_ZEROCOPY operations will fail with
EINVAL.

In other words: because the socket needs to be ESTABLISHED for
MSG_ZEROCOPY to work, and because remote party can send FIN and move
the socket to CLOSE_WAIT, a sending party must implement a fallback
from EINVAL return code on the transmission code. An adversarial
client who does shutdown(SHUT_WR), will trigger EINVAL in the sender..

Marek

On Wed, Jan 9, 2019 at 1:01 PM Marek Majkowski <marek@cloudflare.com> wrote:
>
> Hi,
>
> Current implementation of MSG_ZEROCOPY for TCP requires the socket to
> be ESTABLISHED:
> https://elixir.bootlin.com/linux/v5.0-rc1/source/net/ipv4/tcp.c#L1188
>
> if (sk->sk_state != TCP_ESTABLISHED) {
>     err = -EINVAL;
>     goto out_err;
> }
>
> In TCP it's totally fine to have half-open sockets, for example:
>
> shutdown(5, SHUT_RD)
>
> Moves the socket from ESTABLISHED to CLOSE_WAIT. In such TCP state
> it's possible to continue sending data. This is not supported by
> MSG_ZEROCOPY, which will fail with EINVAL in such case. I think it's a
> bug.
>
> Marek

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: MSG_ZEROCOPY doesn't work on half-open TCP sockets
  2019-01-09 12:48 ` Marek Majkowski
@ 2019-01-09 13:55   ` Willem de Bruijn
  2019-01-09 14:29     ` Willy Tarreau
  0 siblings, 1 reply; 4+ messages in thread
From: Willem de Bruijn @ 2019-01-09 13:55 UTC (permalink / raw)
  To: Marek Majkowski; +Cc: Willem de Bruijn, Network Development

On Wed, Jan 9, 2019 at 7:50 AM Marek Majkowski <marek@cloudflare.com> wrote:
>
> I got it slightly wrong, and it's even worse than this. As far as I
> understand it, the current semantics of MSG_ZEROCOPY on TCP make it
> close to unusable. The problem is that the remote party can move your
> MSG_ZEROCOPY socket from ESTABLISHED to CLOSE_WAIT without your
> involvement. This will mean that even though the program can still
> send() data to the socket, MSG_ZEROCOPY operations will fail with
> EINVAL.
>
> In other words: because the socket needs to be ESTABLISHED for
> MSG_ZEROCOPY to work, and because remote party can send FIN and move
> the socket to CLOSE_WAIT, a sending party must implement a fallback
> from EINVAL return code on the transmission code. An adversarial
> client who does shutdown(SHUT_WR), will trigger EINVAL in the sender..

An adversarial client only affects its own stream, so the impact is limited.

>
> Marek
>
> On Wed, Jan 9, 2019 at 1:01 PM Marek Majkowski <marek@cloudflare.com> wrote:
> >
> > Hi,
> >
> > Current implementation of MSG_ZEROCOPY for TCP requires the socket to
> > be ESTABLISHED:
> > https://elixir.bootlin.com/linux/v5.0-rc1/source/net/ipv4/tcp.c#L1188
> >
> > if (sk->sk_state != TCP_ESTABLISHED) {
> >     err = -EINVAL;
> >     goto out_err;
> > }
> >
> > In TCP it's totally fine to have half-open sockets, for example:
> >
> > shutdown(5, SHUT_RD)
> >
> > Moves the socket from ESTABLISHED to CLOSE_WAIT. In such TCP state
> > it's possible to continue sending data. This is not supported by
> > MSG_ZEROCOPY, which will fail with EINVAL in such case. I think it's a
> > bug.

Thanks for the report. At first blush it seems like extending the
check to include state CLOSE_WAIT would resolve the issue

        if (flags & MSG_ZEROCOPY && size && sock_flag(sk, SOCK_ZEROCOPY)) {
-               if (sk->sk_state != TCP_ESTABLISHED) {
+               if ((1 << sk->sk_state) & ~(TCPF_ESTABLISHED |
TCPF_CLOSE_WAIT)) {

                        err = -EINVAL;
                        goto out_err;
                }

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: MSG_ZEROCOPY doesn't work on half-open TCP sockets
  2019-01-09 13:55   ` Willem de Bruijn
@ 2019-01-09 14:29     ` Willy Tarreau
  0 siblings, 0 replies; 4+ messages in thread
From: Willy Tarreau @ 2019-01-09 14:29 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Marek Majkowski, Willem de Bruijn, Network Development

On Wed, Jan 09, 2019 at 08:55:14AM -0500, Willem de Bruijn wrote:
> > In other words: because the socket needs to be ESTABLISHED for
> > MSG_ZEROCOPY to work, and because remote party can send FIN and move
> > the socket to CLOSE_WAIT, a sending party must implement a fallback
> > from EINVAL return code on the transmission code. An adversarial
> > client who does shutdown(SHUT_WR), will trigger EINVAL in the sender..
> 
> An adversarial client only affects its own stream, so the impact is limited.

Sure but it doesn't necessarily do it on purpose either :-) The typical

   echo -ne "GET /file HTTP/1.1\r\nHost: foo.example.com\r\n\r\n" | nc host port

is perfectly valid and will not work in this case, possibly forcing the newly
deployed component to toll back.

> Thanks for the report. At first blush it seems like extending the
> check to include state CLOSE_WAIT would resolve the issue
> 
>         if (flags & MSG_ZEROCOPY && size && sock_flag(sk, SOCK_ZEROCOPY)) {
> -               if (sk->sk_state != TCP_ESTABLISHED) {
> +               if ((1 << sk->sk_state) & ~(TCPF_ESTABLISHED |
> TCPF_CLOSE_WAIT)) {
> 
>                         err = -EINVAL;
>                         goto out_err;
>                 }

At first glance I think it should do the job.

Willy

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-01-09 14:29 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-09 12:01 MSG_ZEROCOPY doesn't work on half-open TCP sockets Marek Majkowski
2019-01-09 12:48 ` Marek Majkowski
2019-01-09 13:55   ` Willem de Bruijn
2019-01-09 14:29     ` Willy Tarreau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.