All of lore.kernel.org
 help / color / mirror / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Paolo Abeni <pabeni@redhat.com>
Cc: Network Development <netdev@vger.kernel.org>,
	David Miller <davem@davemloft.net>,
	Willem de Bruijn <willemb@google.com>
Subject: Re: [PATCH net-next v2 1/2] udp: msg_zerocopy
Date: Mon, 26 Nov 2018 14:49:07 -0500	[thread overview]
Message-ID: <CAF=yD-KoUX=suZaaV3xzixSM2Kds65YRxCR4HyXXkKp8=aN+XQ@mail.gmail.com> (raw)
In-Reply-To: <CAF=yD-KG+wyHkX40FXPXNiNEHeKjOd5hPSDFuh-z0pUAnLehLg@mail.gmail.com>

On Mon, Nov 26, 2018 at 1:19 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Mon, Nov 26, 2018 at 1:04 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >
> > On Mon, 2018-11-26 at 12:59 -0500, Willem de Bruijn wrote:
> > > The callers of this function do flush the queue of the other skbs on
> > > error, but only after the call to sock_zerocopy_put_abort.
> > >
> > > sock_zerocopy_put_abort depends on total rollback to revert the
> > > sk_zckey increment and suppress the completion notification (which
> > > must not happen on return with error).
> > >
> > > I don't immediately have a fix. Need to think about this some more..
> >
> > [still out of sheer ignorance] How about tacking a refcnt for the whole
> > ip_append_data() scope, like in the tcp case? that will add an atomic
> > op per loop (likely, hitting the cache) but will remove some code hunk
> > in sock_zerocopy_put_abort() and sock_zerocopy_alloc().
>
> The atomic op pair is indeed what I was trying to avoid. But I also need
> to solve the problem that the final decrement will happen from the freeing
> of the other skbs in __ip_flush_pending_frames, and will not suppress
> the notification.
>
> Freeing the entire queue inside __ip_append_data, effectively making it
> a true noop on error is one approach. But that is invasive, also to non
> zerocopy codepaths, so I would rather avoid that.
>
> Perhaps I need to handle the abort logic in udp_sendmsg directly,
> after both __ip_append_data and __ip_flush_pending_frames.

Actually,

(1) the notification suppression is handled correctly, as .._abort
decrements uarg->len. If now zero, this suppresses the notification
in sock_zerocopy_callback, regardless whether that callback is
called right away or from a later kfree_skb.

(2) if moving skb_zcopy_set below getfrag, then no kfree_skb
will be called on a zerocopy skb inside __ip_append_data. So on
abort the refcount is exactly the number of zerocopy skbs on the
queue that will call sock_zerocopy_put later. Abort then only needs
to handle special case zero, and call sock_zerocopy_put right away.

Tentative fix on top of v2 (I'll squash into v3):

---

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 2179ef84bb44..4b21a58329d1 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1099,12 +1099,13 @@ void sock_zerocopy_put_abort(struct ubuf_info *uarg)

-               if (sk->sk_type != SOCK_STREAM && !refcount_read(&uarg->refcnt))
+               if (sk->sk_type != SOCK_STREAM &&
!refcount_read(&uarg->refcnt)) {
                        refcount_set(&uarg->refcnt, 1);
-
-               sock_zerocopy_put(uarg);
+                       sock_zerocopy_put(uarg);
+               }
        }
 }

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 7504da2f33d6..a19396e21b35 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1014,13 +1014,6 @@ static int __ip_append_data(struct sock *sk,
                        skb->csum = 0;
                        skb_reserve(skb, hh_len);

-                       /* only the initial fragment is time stamped */
-                       skb_shinfo(skb)->tx_flags = cork->tx_flags;
-                       cork->tx_flags = 0;
-                       skb_shinfo(skb)->tskey = tskey;
-                       tskey = 0;
-                       skb_zcopy_set(skb, uarg);
-
                        /*
                         *      Find where to start putting bytes.
                         */
@@ -1053,6 +1046,13 @@ static int __ip_append_data(struct sock *sk,
                        exthdrlen = 0;
                        csummode = CHECKSUM_NONE;

+                       /* only the initial fragment is time stamped */
+                       skb_shinfo(skb)->tx_flags = cork->tx_flags;
+                       cork->tx_flags = 0;
+                       skb_shinfo(skb)->tskey = tskey;
+                       tskey = 0;
+                       skb_zcopy_set(skb, uarg);
+
                        if ((flags & MSG_CONFIRM) && !skb_prev)
                                skb_set_dst_pending_confirm(skb, 1);

---

This patch moves all the skb_shinfo touches operations after the copy,
to avoid touching that twice.

Instead of the refcnt trick, I could also refactor sock_zerocopy_put
and call __sock_zerocopy_put

---

-void sock_zerocopy_put(struct ubuf_info *uarg)
+static void __sock_zerocopy_put(struct ubuf_info *uarg)
 {
-       if (uarg && refcount_dec_and_test(&uarg->refcnt)) {
-               if (uarg->callback)
-                       uarg->callback(uarg, uarg->zerocopy);
-               else
-                       consume_skb(skb_from_uarg(uarg));
-       }
+       if (uarg->callback)
+               uarg->callback(uarg, uarg->zerocopy);
+       else
+               consume_skb(skb_from_uarg(uarg));
+}
+
+void sock_zerocopy_put(struct ubuf_info *uarg)
+       if (uarg && refcount_dec_and_test(&uarg->refcnt))
+               __sock_zerocopy_put(uarg);
 }
 EXPORT_SYMBOL_GPL(sock_zerocopy_put);

---

  reply	other threads:[~2018-11-27  6:44 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-26 15:29 [PATCH net-next v2 0/2] udp msg_zerocopy Willem de Bruijn
2018-11-26 15:29 ` [PATCH net-next v2 1/2] udp: msg_zerocopy Willem de Bruijn
2018-11-26 16:32   ` Paolo Abeni
2018-11-26 17:59     ` Willem de Bruijn
2018-11-26 18:04       ` Paolo Abeni
2018-11-26 18:19         ` Willem de Bruijn
2018-11-26 19:49           ` Willem de Bruijn [this message]
2018-11-28 23:50             ` Willem de Bruijn
2018-11-29  8:27               ` Paolo Abeni
2018-11-29 16:17                 ` Willem de Bruijn
2018-11-29  7:31   ` [udp] a4a142d3d7: WARNING:at_lib/refcount.c:#refcount_inc_checked kernel test robot
2018-11-29  7:31     ` kernel test robot
2018-11-26 15:29 ` [PATCH net-next v2 2/2] selftests: extend zerocopy tests to udp Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAF=yD-KoUX=suZaaV3xzixSM2Kds65YRxCR4HyXXkKp8=aN+XQ@mail.gmail.com' \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.