netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>,
	netdev@vger.kernel.org, Marek Majkowski <marek@cloudflare.com>
Subject: Re: [PATCH net] sk_msg: Keep reference on socket file while psock lives
Date: Mon, 27 May 2019 11:27:31 +0200	[thread overview]
Message-ID: <87k1ecz1q4.fsf@cloudflare.com> (raw)
In-Reply-To: <5ce81306aacbe_39402ae86c50a5bc2f@john-XPS-13-9360.notmuch>

On Fri, May 24, 2019 at 05:51 PM CEST, John Fastabend wrote:
> Jakub Sitnicki wrote:
>>
>> Now that those pesky crashes are gone, we plan to look into drops when
>> doing echo with sockmap. Marek tried running echo-sockmap [1] with
>> latest bpf-next (plus mentioned crash fixes) and reports that not all
>> data bounces back:
>>
>> $ yes| head -c $[1024*1024] | nc -q2 192.168.1.33 1234 |wc -c
>> 971832
>> $ yes| head -c $[1024*1024] | nc -q2 192.168.1.33 1234 |wc -c
>> 867352
>> $ yes| head -c $[1024*1024] | nc -q2 192.168.1.33 1234 |wc -c
>> 952648
>>
>> I'm tring to turn echo-sockmap into a selftest but as you can probably
>> guess over loopback all works fine.
>>
>
> Right, sockmap when used from recvmsg with redirect is lossy. This
> was a design choice I made that apparently caught a few people
> by surprise. The original rationale for this was when doing a
> multiplex operation, e.g. single ingress socket to many egress
> sockets blocking would cause head of line blocking on all
> sockets. To resolve this I simply dropped the packet and then allow
> the flow to continue. This pushes the logic up to the application
> to do retries, etc. when this happens. FWIW userspace proxies I
> tested also had similar points where they fell over and dropped
> packets. In hind sight though it probably would have made more
> sense to make this behavior opt-in vs the default. But, the
> use case I was solving at the time I wrote this could handle
> drops and was actually a NxM sockets with N ingress sockets and M
> egress sockets so head of line blocking was a real problem.
>
> Adding a flag to turn this into a blocking op has been on my
> todo list for awhile. Especially when sockmap is being used as
> a single ingress to single egress socket then blocking vs dropping
> makes much more sense.
>
> The place to look is in sk_psock_verdict_apply() in __SK_REDIRECT
> case there is a few checks and notice we can fallthrough to a
> kfree_skb(skb). This is most likely the drops you are hitting.
> Maybe annotate it with a dbg statement to check.
>
> To fix this we could have a flag to _not_ drop but enqueue the
> packet regardless of the test or hold it until space is
> available. I even think sk_psock_strp_read could push back
> on the stream parser which would eventually push back via TCP
> and get the behavior you want.
>
> Also, I have a couple items on my TODO list that I'll eventually
> get to. First we run without stream parsers in some Cilium
> use cases. I'll push some patches to allow this in the next
> months or so. This avoids the annoying stream parser prog that
> simply returns skb->len. This is mostly an optimizations. A
> larger change I want to make at some point is to remove the
> backlog workqueue altogether. Originally it was added for
> simplicity but actually causes some latency spikes when
> looking at 99+ percentiles. It really doesn't need to be
> there it was a hold over from some original architecture that
> got pushed upstream. If you have time and want to let me know
> if you would like to tackle removing it.

This is great stuff. Thanks for explaining the sockmap's design and
decisions behind it. The opt-in blocking mode idea is spot on.

I imagine we'll get back to sockmap once we have a replacement for
TPROXY figured out (unrelated to sockmap). Let's sync then.

-Jakub

      reply	other threads:[~2019-05-27  9:27 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-11  9:09 [PATCH net] sk_msg: Keep reference on socket file while psock lives Jakub Sitnicki
2019-02-19 16:00 ` Daniel Borkmann
2019-02-20  9:47   ` Jakub Sitnicki
2019-05-21 20:07     ` John Fastabend
2019-05-22 11:14       ` Jakub Sitnicki
2019-05-23 15:58         ` John Fastabend
2019-05-24  8:05           ` Jakub Sitnicki
2019-05-24 15:51             ` John Fastabend
2019-05-27  9:27               ` Jakub Sitnicki [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k1ecz1q4.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=daniel@iogearbox.net \
    --cc=john.fastabend@gmail.com \
    --cc=marek@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).