All of lore.kernel.org
 help / color / mirror / Atom feed
From: Neil Horman <nhorman@tuxdriver.com>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Network Development <netdev@vger.kernel.org>,
	Matteo Croce <mcroce@redhat.com>,
	"David S. Miller" <davem@davemloft.net>
Subject: Re: [PATCH v4 net] af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET
Date: Wed, 26 Jun 2019 13:14:39 -0400	[thread overview]
Message-ID: <20190626171439.GB31355@hmswarspite.think-freely.org> (raw)
In-Reply-To: <CAF=yD-+_khMRCK0gE2q7nAi8fAtwvZ2FerHZKo1U1M-=991+Zg@mail.gmail.com>

On Wed, Jun 26, 2019 at 11:05:39AM -0400, Willem de Bruijn wrote:
> On Wed, Jun 26, 2019 at 6:54 AM Neil Horman <nhorman@tuxdriver.com> wrote:
> >
> > On Tue, Jun 25, 2019 at 06:30:08PM -0400, Willem de Bruijn wrote:
> > > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> > > > index a29d66da7394..a7ca6a003ebe 100644
> > > > --- a/net/packet/af_packet.c
> > > > +++ b/net/packet/af_packet.c
> > > > @@ -2401,6 +2401,9 @@ static void tpacket_destruct_skb(struct sk_buff *skb)
> > > >
> > > >                 ts = __packet_set_timestamp(po, ph, skb);
> > > >                 __packet_set_status(po, ph, TP_STATUS_AVAILABLE | ts);
> > > > +
> > > > +               if (!packet_read_pending(&po->tx_ring))
> > > > +                       complete(&po->skb_completion);
> > > >         }
> > > >
> > > >         sock_wfree(skb);
> > > > @@ -2585,7 +2588,7 @@ static int tpacket_parse_header(struct packet_sock *po, void *frame,
> > > >
> > > >  static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
> > > >  {
> > > > -       struct sk_buff *skb;
> > > > +       struct sk_buff *skb = NULL;
> > > >         struct net_device *dev;
> > > >         struct virtio_net_hdr *vnet_hdr = NULL;
> > > >         struct sockcm_cookie sockc;
> > > > @@ -2600,6 +2603,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
> > > >         int len_sum = 0;
> > > >         int status = TP_STATUS_AVAILABLE;
> > > >         int hlen, tlen, copylen = 0;
> > > > +       long timeo = 0;
> > > >
> > > >         mutex_lock(&po->pg_vec_lock);
> > > >
> > > > @@ -2646,12 +2650,21 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
> > > >         if ((size_max > dev->mtu + reserve + VLAN_HLEN) && !po->has_vnet_hdr)
> > > >                 size_max = dev->mtu + reserve + VLAN_HLEN;
> > > >
> > > > +       reinit_completion(&po->skb_completion);
> > > > +
> > > >         do {
> > > >                 ph = packet_current_frame(po, &po->tx_ring,
> > > >                                           TP_STATUS_SEND_REQUEST);
> > > >                 if (unlikely(ph == NULL)) {
> > > > -                       if (need_wait && need_resched())
> > > > -                               schedule();
> > > > +                       if (need_wait && skb) {
> > > > +                               timeo = sock_sndtimeo(&po->sk, msg->msg_flags & MSG_DONTWAIT);
> > > > +                               timeo = wait_for_completion_interruptible_timeout(&po->skb_completion, timeo);
> > >
> > > This looks really nice.
> > >
> > > But isn't it still susceptible to the race where tpacket_destruct_skb
> > > is called in between po->xmit and this
> > > wait_for_completion_interruptible_timeout?
> > >
> > Thats not an issue, since the complete is only gated on packet_read_pending
> > reaching 0 in tpacket_destuct_skb.  Previously it was gated on my
> > wait_on_complete flag being non-zero, so we had to set that prior to calling
> > po->xmit, or the complete call might never get made, resulting in a hang.  Now,
> > we will always call complete, and the completion api allows for arbitrary
> > ordering of complete/wait_for_complete (since its internal done variable gets
> > incremented), making a call to wait_for_complete effectively a fall through if
> > complete gets called first.
> 
> Perfect! I was not aware that it handles that internally. Hadn't read
> do_wait_for_common closely before.
> 
> > There is an odd path here though.  If an application calls sendmsg on a packet
> > socket here with MSG_DONTWAIT set, then need_wait will be zero, and we will
> > eventually exit this loop without ever having called wait_for_complete, but
> > tpacket_destruct_skb will still have called complete when all the frames
> > complete transmission.  In and of itself, thats fine, but it leave the
> > completion structure in a state where its done variable will have been
> > incremented at least once (specifically it will be set to N, where N is the
> > number of frames transmitted during the call where MSG_DONTWAIT is set).  If the
> > application then calls sendmsg on this socket with MSG_DONTWAIT clear, we will
> > call wait_for_complete, but immediately return from it (due to the previously
> > made calls to complete).  I've corrected this however, but adding that call to
> > reinit_completion prior to the loop entry, so that we are always guaranteed to
> > have the completion variable set properly to wait for only the frames being sent
> > in this particular instance of the sendmsg call.
> 
> Yep, understood.
> 
> >
> > > The test for skb is shorthand for packet_read_pending  != 0, right?
> > >
> > Sort of.  gating on skb guarantees for us that we have sent at least one frame
> > in this call to tpacket_snd.  If we didn't do that, then it would be possible
> > for an application to call sendmsg without setting any frames in the buffer to
> > TP_STATUS_SEND_REQUEST, which would cause us to wait for a completion without
> > having sent any frames, meaning we would block waiting for an event
> > (tpacket_destruct_skb), that will never happen.  The check for skb ensures that
> > tpacket_snd_skb will get called, and that we will get a wakeup from a call to
> > wait_for_complete.  It does suggest that packet_read_pending != 0, but thats not
> > guaranteed, because tpacket_destruct_skb may already have been called (see the
> > above explination regarding ordering of complete/wait_for_complete).
> 
> But the inverse is true: if gating sleeping on packet_read_pending,
> the process only ever waits if a packet is still to be acknowledged.
> Then both the wait and wake clearly depend on the same state.
> 
> Either way works, I think. So this is definitely fine.
> 
Yeah, we could do that.  Its basically a pick your poison situation, in the case
you stipulate, we could gate the wait_on_complete on read_pending being
non-zero, but if a frame frees quickly and decrements the pending count, we
still leave the loop with a completion struct that needs to be reset.  Either
way we have to re-init the completion.  And as for calling wait_on_complete when
we don't have to, your proposal does solve that prolbem but requires that we
call packet_read_pending an extra time for iteration on every loop.
Packet_read_pending accumulates the sum of all the per-cpu pointer pending
counts (which is a separate problem, I'm not sure why we're using per-cpu
counters there).  Regardless, I looked at that and (anecdotally), decided that
periodically calling wait_for_complete which takes a spin lock would be more
performant than accessing a per-cpu variable on each available cpu every
iteration of the loop (based on the comments at the bottom of the loop).

> One possible refinement would be to keep po->wait_on_complete (but
> rename as po->wake_om_complete), set it before entering the loop and
> clear it before function return (both within the pg_vec_lock critical
> section). And test that in tpacket_destruct_skb to avoid calling
> complete if MSG_DONTWAIT. But I don't think it's worth the complexity.
> 
I agree, we could use a socket variable to communicate to tpacket_destruct_skb
that we need to call complete, in conjunction with the pending count, but I
don't think the added complexity buys us anything.

> One rare edge case is a MSG_DONTWAIT send followed by a !MSG_DONTWAIT.
> It is then possible for a tpacket_destruct_skb to be run as a result
> from the first call, during the second call, after the call to
> reinit_completion. That would cause the next wait to return before
> *its* packets have been sent. But due to the packet_read_pending test
> in the while () condition it will loop again and return to wait. So that's fine.
> 
yup, exactly.

> Thanks for bearing with me.
> 
> Reviewed-by: Willem de Bruijn <willemb@google.com>
> 

  reply	other threads:[~2019-06-26 17:15 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-19 20:25 [PATCH net] af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET Neil Horman
2019-06-20 13:41 ` Willem de Bruijn
2019-06-20 14:01   ` Matteo Croce
2019-06-20 14:23   ` Neil Horman
2019-06-20 15:16     ` Willem de Bruijn
2019-06-20 16:14       ` Neil Horman
2019-06-20 16:18         ` Willem de Bruijn
2019-06-20 17:31           ` Neil Horman
2019-06-21 16:41       ` Neil Horman
2019-06-21 18:31         ` Willem de Bruijn
2019-06-21 19:18           ` Neil Horman
2019-06-21 20:06             ` Willem de Bruijn
2019-06-22 11:08               ` Neil Horman
2019-06-22 17:41 ` [PATCH v2 " Neil Horman
2019-06-23  2:12   ` Willem de Bruijn
2019-06-23  2:21     ` Willem de Bruijn
2019-06-23 11:40       ` Neil Horman
2019-06-23 14:39         ` Willem de Bruijn
2019-06-23 19:21           ` Neil Horman
2019-06-23 11:34     ` Neil Horman
2019-06-24  0:46 ` [PATCH v3 " Neil Horman
2019-06-24 18:08   ` Willem de Bruijn
2019-06-24 21:51     ` Neil Horman
2019-06-24 22:15       ` Willem de Bruijn
2019-06-25 11:02         ` Neil Horman
2019-06-25 13:37           ` Willem de Bruijn
2019-06-25 16:20             ` Neil Horman
2019-06-25 21:59               ` Willem de Bruijn
2019-06-25 21:57 ` [PATCH v4 " Neil Horman
2019-06-25 22:30   ` Willem de Bruijn
2019-06-26 10:54     ` Neil Horman
2019-06-26 15:05       ` Willem de Bruijn
2019-06-26 17:14         ` Neil Horman [this message]
2019-06-27  2:38   ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190626171439.GB31355@hmswarspite.think-freely.org \
    --to=nhorman@tuxdriver.com \
    --cc=davem@davemloft.net \
    --cc=mcroce@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.