All of lore.kernel.org
 help / color / mirror / Atom feed
From: Neil Horman <nhorman@tuxdriver.com>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Network Development <netdev@vger.kernel.org>,
	Matteo Croce <mcroce@redhat.com>,
	"David S. Miller" <davem@davemloft.net>
Subject: Re: [PATCH v2 net] af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET
Date: Sun, 23 Jun 2019 15:21:23 -0400	[thread overview]
Message-ID: <20190623192123.GA32287@hmswarspite.think-freely.org> (raw)
In-Reply-To: <CAF=yD-L5Lu6L4Ji=OZgAkDb28zL=BVsM5HgqWMxMTiJ1YUZJDw@mail.gmail.com>

On Sun, Jun 23, 2019 at 10:39:12AM -0400, Willem de Bruijn wrote:
> On Sun, Jun 23, 2019 at 7:40 AM Neil Horman <nhorman@tuxdriver.com> wrote:
> >
> > On Sat, Jun 22, 2019 at 10:21:31PM -0400, Willem de Bruijn wrote:
> > > > > -static void __packet_set_status(struct packet_sock *po, void *frame, int status)
> > > > > +static void __packet_set_status(struct packet_sock *po, void *frame, int status,
> > > > > +                               bool call_complete)
> > > > >  {
> > > > >         union tpacket_uhdr h;
> > > > >
> > > > > @@ -381,6 +382,8 @@ static void __packet_set_status(struct packet_sock *po, void *frame, int status)
> > > > >                 BUG();
> > > > >         }
> > > > >
> > > > > +       if (po->wait_on_complete && call_complete)
> > > > > +               complete(&po->skb_completion);
> > > >
> > > > This wake need not happen before the barrier. Only one caller of
> > > > __packet_set_status passes call_complete (tpacket_destruct_skb).
> > > > Moving this branch to the caller avoids a lot of code churn.
> > > >
> > > > Also, multiple packets may be released before the process is awoken.
> > > > The process will block until packet_read_pending drops to zero. Can
> > > > defer the wait_on_complete to that one instance.
> > >
> > > Eh no. The point of having this sleep in the send loop is that
> > > additional slots may be released for transmission (flipped to
> > > TP_STATUS_SEND_REQUEST) from another thread while this thread is
> > > waiting.
> > >
> > Thats incorrect.  The entirety of tpacket_snd is protected by a mutex. No other
> > thread can alter the state of the frames in the vector from the kernel send path
> > while this thread is waiting.
> 
> I meant another user thread updating the memory mapped ring contents.
> 
Yes, thats true, and if that happens, we will loop through this path again (the
do..while section, picking up the next frame for transmit)

> > > Else, it would have been much simpler to move the wait below the send
> > > loop: send as many packets as possible, then wait for all of them
> > > having been released. Much clearer control flow.
> > >
> > Thats (almost) what happens now.  The only difference is that with this
> > implementation, the waiting thread has the opportunity to see if userspace has
> > queued more frames for transmission during the wait period.  We could
> > potentially change that, but thats outside the scope of this fix.
> 
> Agreed. I think the current, more complex, behavior was intentional.
> We could still restructure to move it out of the loop and jump back.
> But, yes, definitely out of scope for a fix.
> 
Yes, it was, though based on your comments I've moved the wait_for_completion
call to the bottom of the loop, so its only checked after we are guaranteed to
have sent at least one frame.  I think that makes the code a bit more legible.

> > > Where to set and clear the wait_on_complete boolean remains. Integer
> > > assignment is fragile, as the compiler and processor may optimize or
> > > move simple seemingly independent operations. As complete() takes a
> > > spinlock, avoiding that in the DONTWAIT case is worthwhile. But probably
> > > still preferable to set when beginning waiting and clear when calling
> > > complete.
> > We avoid any call to wait_for_complete or complete already, based on the gating
> > of the need_wait variable in tpacket_snd.  If the transmitting thread doesn't
> > set MSG_DONTWAIT in the flags of the msg structure, we will never set
> > wait_for_complete, and so we will never manipulate the completion queue.
> 
> But we don't know the state of this at tpacket_destruct_skb time without
> wait_for_completion?
> 
Sure we do, wait_for_complete is stored in the packet_sock structure, which is
available and stable at the time tpacket_destruct_skb is called.
po->wait_for_complete is set in tpacket_snd iff:
1) The MSG_DONTWAIT flag is clear
and
2) We have detected that the next frame in the memory mapped buffer does not
have its status set to TP_STATUS_SEND_REQUEST.

If those two conditions are true, we set po->wait_for_complete to 1, which
indicates that tpacket_destruct_skb should call complete, when all the frames
we've sent to the physical layer have been freed (i.e. when packet_read_pending
is zero).

If wait_for_complete is non-zero, we also can be confident that the
calling task is either:
a) Already blocking on wait_for_completion_interruptible_timeout
or
b) Will be waiting on it shortly

In case (a) the blocking/transmitting task will be woken up, and continue on its
way

In case (b) the transmitting task will call
wait_for_completion_interruptible_timeout, see that the completion has already
been called (based on the completion structs done variable being positive), and
return immediately.

I've made a slight update to the logic/comments in my next version to make that a little
more clear

Neil


  reply	other threads:[~2019-06-23 19:21 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-19 20:25 [PATCH net] af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET Neil Horman
2019-06-20 13:41 ` Willem de Bruijn
2019-06-20 14:01   ` Matteo Croce
2019-06-20 14:23   ` Neil Horman
2019-06-20 15:16     ` Willem de Bruijn
2019-06-20 16:14       ` Neil Horman
2019-06-20 16:18         ` Willem de Bruijn
2019-06-20 17:31           ` Neil Horman
2019-06-21 16:41       ` Neil Horman
2019-06-21 18:31         ` Willem de Bruijn
2019-06-21 19:18           ` Neil Horman
2019-06-21 20:06             ` Willem de Bruijn
2019-06-22 11:08               ` Neil Horman
2019-06-22 17:41 ` [PATCH v2 " Neil Horman
2019-06-23  2:12   ` Willem de Bruijn
2019-06-23  2:21     ` Willem de Bruijn
2019-06-23 11:40       ` Neil Horman
2019-06-23 14:39         ` Willem de Bruijn
2019-06-23 19:21           ` Neil Horman [this message]
2019-06-23 11:34     ` Neil Horman
2019-06-24  0:46 ` [PATCH v3 " Neil Horman
2019-06-24 18:08   ` Willem de Bruijn
2019-06-24 21:51     ` Neil Horman
2019-06-24 22:15       ` Willem de Bruijn
2019-06-25 11:02         ` Neil Horman
2019-06-25 13:37           ` Willem de Bruijn
2019-06-25 16:20             ` Neil Horman
2019-06-25 21:59               ` Willem de Bruijn
2019-06-25 21:57 ` [PATCH v4 " Neil Horman
2019-06-25 22:30   ` Willem de Bruijn
2019-06-26 10:54     ` Neil Horman
2019-06-26 15:05       ` Willem de Bruijn
2019-06-26 17:14         ` Neil Horman
2019-06-27  2:38   ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190623192123.GA32287@hmswarspite.think-freely.org \
    --to=nhorman@tuxdriver.com \
    --cc=davem@davemloft.net \
    --cc=mcroce@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.