All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: Cong Wang <xiyou.wangcong@gmail.com>,
	John Fastabend <john.fastabend@gmail.com>
Cc: Linux Kernel Network Developers <netdev@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>,
	duanxiongchun@bytedance.com,
	Dongdong Wang <wangdongdong.6@bytedance.com>,
	Jiang Wang <jiang.wang@bytedance.com>,
	Cong Wang <cong.wang@bytedance.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Jakub Sitnicki <jakub@cloudflare.com>,
	Lorenz Bauer <lmb@cloudflare.com>
Subject: Re: [Patch bpf-next v7 09/13] udp: implement ->read_sock() for sockmap
Date: Mon, 29 Mar 2021 23:23:15 -0700	[thread overview]
Message-ID: <6062c3d37db9e_600ea20898@john-XPS-13-9370.notmuch> (raw)
In-Reply-To: <CAM_iQpVgdP1w73skJ3W-MHkO-pPVKT7WM06Fqc35XkXjDcWf_Q@mail.gmail.com>

Cong Wang wrote:
> On Mon, Mar 29, 2021 at 1:54 PM John Fastabend <john.fastabend@gmail.com> wrote:
> >
> > Cong Wang wrote:
> > > From: Cong Wang <cong.wang@bytedance.com>
> > >
> > > This is similar to tcp_read_sock(), except we do not need
> > > to worry about connections, we just need to retrieve skb
> > > from UDP receive queue.
> > >
> > > Note, the return value of ->read_sock() is unused in
> > > sk_psock_verdict_data_ready().
> > >
> > > Cc: John Fastabend <john.fastabend@gmail.com>
> > > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > > Cc: Lorenz Bauer <lmb@cloudflare.com>
> > > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > > ---

[...]

> > >  }
> > >  EXPORT_SYMBOL(__skb_recv_udp);
> > >
> > > +int udp_read_sock(struct sock *sk, read_descriptor_t *desc,
> > > +               sk_read_actor_t recv_actor)
> > > +{
> > > +     int copied = 0;
> > > +
> > > +     while (1) {
> > > +             int offset = 0, err;
> >
> > Should this be
> >
> >  int offset = sk_peek_offset()?
> 
> What are you really suggesting? sk_peek_offset() is just 0 unless
> we have MSG_PEEK here and we don't, because we really want to
> dequeue the skb rather than peeking it.
> 
> Are you suggesting we should do peeking? I am afraid we can't.
> Please be specific, guessing your mind is not an effective way to
> address your reviews.

I was only asking for further details because the offset addition
below struck me as odd.

> 
> >
> > MSG_PEEK should work from recv side, at least it does on TCP side. If
> > its handled in some following patch a comment would be nice. I was
> > just reading udp_recvmsg() so maybe its not needed.
> 
> Please explain why do we need peeking in sockmap? At very least
> it has nothing to do with my patchset.

We need MSG_PEEK to work from application side. From sockmap
side I agree its not needed.

> 
> I do not know why you want to use TCP as a "standard" here, TCP
> also supports splice(), UDP still doesn't even with ->read_sock().
> Of course they are very different.

Not claiming any "standard" here only that user application needs
to work correctly if it passes MSG_PEEK.

> 
> >
> > > +             struct sk_buff *skb;
> > > +
> > > +             skb = __skb_recv_udp(sk, 0, 1, &offset, &err);
> > > +             if (!skb)
> > > +                     return err;
> > > +             if (offset < skb->len) {
> > > +                     size_t len;
> > > +                     int used;
> > > +
> > > +                     len = skb->len - offset;
> > > +                     used = recv_actor(desc, skb, offset, len);
> > > +                     if (used <= 0) {
> > > +                             if (!copied)
> > > +                                     copied = used;
> > > +                             break;
> > > +                     } else if (used <= len) {
> > > +                             copied += used;
> > > +                             offset += used;
> >
> > The while loop is going to zero this? What are we trying to do
> > here with offset?
> 
> offset only matters for MSG_PEEK and we do not support peeking
> in sockmap case, hence it is unnecessary here. I "use" it here just
> to make the code as complete as possible.

huh? If its not used the addition is just confusing. Can we drop it?

> 
> To further answer your question, it is set to 0 when we return a
> valid skb on line 201 inside __skb_try_recv_from_queue(), as
> "_off" is set to 0 and won't change unless we have MSG_PEEK.
> 
> 173         bool peek_at_off = false;
> 174         struct sk_buff *skb;
> 175         int _off = 0;
> 176
> 177         if (unlikely(flags & MSG_PEEK && *off >= 0)) {
> 178                 peek_at_off = true;
> 179                 _off = *off;
> 180         }
> 181
> 182         *last = queue->prev;
> 183         skb_queue_walk(queue, skb) {
> 184                 if (flags & MSG_PEEK) {
> 185                         if (peek_at_off && _off >= skb->len &&
> 186                             (_off || skb->peeked)) {
> 187                                 _off -= skb->len;
> 188                                 continue;
> 189                         }
> 190                         if (!skb->len) {
> 191                                 skb = skb_set_peeked(skb);
> 192                                 if (IS_ERR(skb)) {
> 193                                         *err = PTR_ERR(skb);
> 194                                         return NULL;
> 195                                 }
> 196                         }
> 197                         refcount_inc(&skb->users);
> 198                 } else {
> 199                         __skb_unlink(skb, queue);
> 200                 }
> 201                 *off = _off;
> 202                 return skb;
> 
> Of course, when we return NULL, we return immediately without
> using offset:
> 
> 1794                 skb = __skb_recv_udp(sk, 0, 1, &offset, &err);
> 1795                 if (!skb)
> 1796                         return err;
> 
> This should not be hard to figure out. Hope it is clear now.
> 

Yes, but tracking offset only to clear it a couple lines later
is confusing.

> Thanks.



  reply	other threads:[~2021-03-30  6:24 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-28 20:20 [Patch bpf-next v7 00/13] sockmap: introduce BPF_SK_SKB_VERDICT and support UDP Cong Wang
2021-03-28 20:20 ` [Patch bpf-next v7 01/13] skmsg: lock ingress_skb when purging Cong Wang
2021-03-28 20:20 ` [Patch bpf-next v7 02/13] skmsg: introduce a spinlock to protect ingress_msg Cong Wang
2021-03-29 19:11   ` John Fastabend
2021-03-28 20:20 ` [Patch bpf-next v7 03/13] net: introduce skb_send_sock() for sock_map Cong Wang
2021-03-28 20:20 ` [Patch bpf-next v7 04/13] skmsg: avoid lock_sock() in sk_psock_backlog() Cong Wang
2021-03-29 19:41   ` John Fastabend
2021-03-28 20:20 ` [Patch bpf-next v7 05/13] skmsg: use rcu work for destroying psock Cong Wang
2021-03-29 19:42   ` John Fastabend
2021-03-28 20:20 ` [Patch bpf-next v7 06/13] skmsg: use GFP_KERNEL in sk_psock_create_ingress_msg() Cong Wang
2021-03-29 19:44   ` John Fastabend
2021-03-28 20:20 ` [Patch bpf-next v7 07/13] sock_map: introduce BPF_SK_SKB_VERDICT Cong Wang
2021-03-29 20:09   ` John Fastabend
2021-03-30  1:27     ` Cong Wang
2021-03-28 20:20 ` [Patch bpf-next v7 08/13] sock: introduce sk->sk_prot->psock_update_sk_prot() Cong Wang
2021-03-28 20:20 ` [Patch bpf-next v7 09/13] udp: implement ->read_sock() for sockmap Cong Wang
2021-03-29 20:54   ` John Fastabend
2021-03-30  5:39     ` Cong Wang
2021-03-30  6:23       ` John Fastabend [this message]
2021-03-30  6:36         ` Cong Wang
2021-03-30  6:45           ` John Fastabend
2021-03-28 20:20 ` [Patch bpf-next v7 10/13] skmsg: extract __tcp_bpf_recvmsg() and tcp_bpf_wait_data() Cong Wang
2021-03-28 20:20 ` [Patch bpf-next v7 11/13] udp: implement udp_bpf_recvmsg() for sockmap Cong Wang
2021-03-28 20:20 ` [Patch bpf-next v7 12/13] sock_map: update sock type checks for UDP Cong Wang
2021-03-29 23:10   ` John Fastabend
2021-03-30  5:47     ` Cong Wang
2021-03-28 20:20 ` [Patch bpf-next v7 13/13] selftests/bpf: add a test case for udp sockmap Cong Wang
2021-03-28 23:27 ` [Patch bpf-next v7 00/13] sockmap: introduce BPF_SK_SKB_VERDICT and support UDP Alexei Starovoitov
2021-03-29 15:03   ` John Fastabend
2021-03-29 16:57     ` Cong Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6062c3d37db9e_600ea20898@john-XPS-13-9370.notmuch \
    --to=john.fastabend@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=cong.wang@bytedance.com \
    --cc=daniel@iogearbox.net \
    --cc=duanxiongchun@bytedance.com \
    --cc=jakub@cloudflare.com \
    --cc=jiang.wang@bytedance.com \
    --cc=lmb@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    --cc=wangdongdong.6@bytedance.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.