netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: Network Development <netdev@vger.kernel.org>,
	Matteo Croce <mcroce@redhat.com>,
	"David S. Miller" <davem@davemloft.net>
Subject: Re: [PATCH v4 net] af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET
Date: Wed, 26 Jun 2019 11:05:39 -0400	[thread overview]
Message-ID: <CAF=yD-+_khMRCK0gE2q7nAi8fAtwvZ2FerHZKo1U1M-=991+Zg@mail.gmail.com> (raw)
In-Reply-To: <20190626105403.GA31355@hmswarspite.think-freely.org>

On Wed, Jun 26, 2019 at 6:54 AM Neil Horman <nhorman@tuxdriver.com> wrote:
>
> On Tue, Jun 25, 2019 at 06:30:08PM -0400, Willem de Bruijn wrote:
> > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> > > index a29d66da7394..a7ca6a003ebe 100644
> > > --- a/net/packet/af_packet.c
> > > +++ b/net/packet/af_packet.c
> > > @@ -2401,6 +2401,9 @@ static void tpacket_destruct_skb(struct sk_buff *skb)
> > >
> > >                 ts = __packet_set_timestamp(po, ph, skb);
> > >                 __packet_set_status(po, ph, TP_STATUS_AVAILABLE | ts);
> > > +
> > > +               if (!packet_read_pending(&po->tx_ring))
> > > +                       complete(&po->skb_completion);
> > >         }
> > >
> > >         sock_wfree(skb);
> > > @@ -2585,7 +2588,7 @@ static int tpacket_parse_header(struct packet_sock *po, void *frame,
> > >
> > >  static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
> > >  {
> > > -       struct sk_buff *skb;
> > > +       struct sk_buff *skb = NULL;
> > >         struct net_device *dev;
> > >         struct virtio_net_hdr *vnet_hdr = NULL;
> > >         struct sockcm_cookie sockc;
> > > @@ -2600,6 +2603,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
> > >         int len_sum = 0;
> > >         int status = TP_STATUS_AVAILABLE;
> > >         int hlen, tlen, copylen = 0;
> > > +       long timeo = 0;
> > >
> > >         mutex_lock(&po->pg_vec_lock);
> > >
> > > @@ -2646,12 +2650,21 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
> > >         if ((size_max > dev->mtu + reserve + VLAN_HLEN) && !po->has_vnet_hdr)
> > >                 size_max = dev->mtu + reserve + VLAN_HLEN;
> > >
> > > +       reinit_completion(&po->skb_completion);
> > > +
> > >         do {
> > >                 ph = packet_current_frame(po, &po->tx_ring,
> > >                                           TP_STATUS_SEND_REQUEST);
> > >                 if (unlikely(ph == NULL)) {
> > > -                       if (need_wait && need_resched())
> > > -                               schedule();
> > > +                       if (need_wait && skb) {
> > > +                               timeo = sock_sndtimeo(&po->sk, msg->msg_flags & MSG_DONTWAIT);
> > > +                               timeo = wait_for_completion_interruptible_timeout(&po->skb_completion, timeo);
> >
> > This looks really nice.
> >
> > But isn't it still susceptible to the race where tpacket_destruct_skb
> > is called in between po->xmit and this
> > wait_for_completion_interruptible_timeout?
> >
> Thats not an issue, since the complete is only gated on packet_read_pending
> reaching 0 in tpacket_destuct_skb.  Previously it was gated on my
> wait_on_complete flag being non-zero, so we had to set that prior to calling
> po->xmit, or the complete call might never get made, resulting in a hang.  Now,
> we will always call complete, and the completion api allows for arbitrary
> ordering of complete/wait_for_complete (since its internal done variable gets
> incremented), making a call to wait_for_complete effectively a fall through if
> complete gets called first.

Perfect! I was not aware that it handles that internally. Hadn't read
do_wait_for_common closely before.

> There is an odd path here though.  If an application calls sendmsg on a packet
> socket here with MSG_DONTWAIT set, then need_wait will be zero, and we will
> eventually exit this loop without ever having called wait_for_complete, but
> tpacket_destruct_skb will still have called complete when all the frames
> complete transmission.  In and of itself, thats fine, but it leave the
> completion structure in a state where its done variable will have been
> incremented at least once (specifically it will be set to N, where N is the
> number of frames transmitted during the call where MSG_DONTWAIT is set).  If the
> application then calls sendmsg on this socket with MSG_DONTWAIT clear, we will
> call wait_for_complete, but immediately return from it (due to the previously
> made calls to complete).  I've corrected this however, but adding that call to
> reinit_completion prior to the loop entry, so that we are always guaranteed to
> have the completion variable set properly to wait for only the frames being sent
> in this particular instance of the sendmsg call.

Yep, understood.

>
> > The test for skb is shorthand for packet_read_pending  != 0, right?
> >
> Sort of.  gating on skb guarantees for us that we have sent at least one frame
> in this call to tpacket_snd.  If we didn't do that, then it would be possible
> for an application to call sendmsg without setting any frames in the buffer to
> TP_STATUS_SEND_REQUEST, which would cause us to wait for a completion without
> having sent any frames, meaning we would block waiting for an event
> (tpacket_destruct_skb), that will never happen.  The check for skb ensures that
> tpacket_snd_skb will get called, and that we will get a wakeup from a call to
> wait_for_complete.  It does suggest that packet_read_pending != 0, but thats not
> guaranteed, because tpacket_destruct_skb may already have been called (see the
> above explination regarding ordering of complete/wait_for_complete).

But the inverse is true: if gating sleeping on packet_read_pending,
the process only ever waits if a packet is still to be acknowledged.
Then both the wait and wake clearly depend on the same state.

Either way works, I think. So this is definitely fine.

One possible refinement would be to keep po->wait_on_complete (but
rename as po->wake_om_complete), set it before entering the loop and
clear it before function return (both within the pg_vec_lock critical
section). And test that in tpacket_destruct_skb to avoid calling
complete if MSG_DONTWAIT. But I don't think it's worth the complexity.

One rare edge case is a MSG_DONTWAIT send followed by a !MSG_DONTWAIT.
It is then possible for a tpacket_destruct_skb to be run as a result
from the first call, during the second call, after the call to
reinit_completion. That would cause the next wait to return before
*its* packets have been sent. But due to the packet_read_pending test
in the while () condition it will loop again and return to wait. So that's fine.

Thanks for bearing with me.

Reviewed-by: Willem de Bruijn <willemb@google.com>

  reply	other threads:[~2019-06-26 15:06 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-19 20:25 [PATCH net] af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET Neil Horman
2019-06-20 13:41 ` Willem de Bruijn
2019-06-20 14:01   ` Matteo Croce
2019-06-20 14:23   ` Neil Horman
2019-06-20 15:16     ` Willem de Bruijn
2019-06-20 16:14       ` Neil Horman
2019-06-20 16:18         ` Willem de Bruijn
2019-06-20 17:31           ` Neil Horman
2019-06-21 16:41       ` Neil Horman
2019-06-21 18:31         ` Willem de Bruijn
2019-06-21 19:18           ` Neil Horman
2019-06-21 20:06             ` Willem de Bruijn
2019-06-22 11:08               ` Neil Horman
2019-06-22 17:41 ` [PATCH v2 " Neil Horman
2019-06-23  2:12   ` Willem de Bruijn
2019-06-23  2:21     ` Willem de Bruijn
2019-06-23 11:40       ` Neil Horman
2019-06-23 14:39         ` Willem de Bruijn
2019-06-23 19:21           ` Neil Horman
2019-06-23 11:34     ` Neil Horman
2019-06-24  0:46 ` [PATCH v3 " Neil Horman
2019-06-24 18:08   ` Willem de Bruijn
2019-06-24 21:51     ` Neil Horman
2019-06-24 22:15       ` Willem de Bruijn
2019-06-25 11:02         ` Neil Horman
2019-06-25 13:37           ` Willem de Bruijn
2019-06-25 16:20             ` Neil Horman
2019-06-25 21:59               ` Willem de Bruijn
2019-06-25 21:57 ` [PATCH v4 " Neil Horman
2019-06-25 22:30   ` Willem de Bruijn
2019-06-26 10:54     ` Neil Horman
2019-06-26 15:05       ` Willem de Bruijn [this message]
2019-06-26 17:14         ` Neil Horman
2019-06-27  2:38   ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAF=yD-+_khMRCK0gE2q7nAi8fAtwvZ2FerHZKo1U1M-=991+Zg@mail.gmail.com' \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=davem@davemloft.net \
    --cc=mcroce@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=nhorman@tuxdriver.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).