All of lore.kernel.org
 help / color / mirror / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: Network Development <netdev@vger.kernel.org>,
	Matteo Croce <mcroce@redhat.com>,
	"David S. Miller" <davem@davemloft.net>
Subject: Re: [PATCH v4 net] af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET
Date: Wed, 26 Jun 2019 11:05:39 -0400	[thread overview]
Message-ID: <CAF=yD-+_khMRCK0gE2q7nAi8fAtwvZ2FerHZKo1U1M-=991+Zg@mail.gmail.com> (raw)
In-Reply-To: <20190626105403.GA31355@hmswarspite.think-freely.org>

On Wed, Jun 26, 2019 at 6:54 AM Neil Horman <nhorman@tuxdriver.com> wrote:
>
> On Tue, Jun 25, 2019 at 06:30:08PM -0400, Willem de Bruijn wrote:
> > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> > > index a29d66da7394..a7ca6a003ebe 100644
> > > --- a/net/packet/af_packet.c
> > > +++ b/net/packet/af_packet.c
> > > @@ -2401,6 +2401,9 @@ static void tpacket_destruct_skb(struct sk_buff *skb)
> > >
> > >                 ts = __packet_set_timestamp(po, ph, skb);
> > >                 __packet_set_status(po, ph, TP_STATUS_AVAILABLE | ts);
> > > +
> > > +               if (!packet_read_pending(&po->tx_ring))
> > > +                       complete(&po->skb_completion);
> > >         }
> > >
> > >         sock_wfree(skb);
> > > @@ -2585,7 +2588,7 @@ static int tpacket_parse_header(struct packet_sock *po, void *frame,
> > >
> > >  static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
> > >  {
> > > -       struct sk_buff *skb;
> > > +       struct sk_buff *skb = NULL;
> > >         struct net_device *dev;
> > >         struct virtio_net_hdr *vnet_hdr = NULL;
> > >         struct sockcm_cookie sockc;
> > > @@ -2600,6 +2603,7 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
> > >         int len_sum = 0;
> > >         int status = TP_STATUS_AVAILABLE;
> > >         int hlen, tlen, copylen = 0;
> > > +       long timeo = 0;
> > >
> > >         mutex_lock(&po->pg_vec_lock);
> > >
> > > @@ -2646,12 +2650,21 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
> > >         if ((size_max > dev->mtu + reserve + VLAN_HLEN) && !po->has_vnet_hdr)
> > >                 size_max = dev->mtu + reserve + VLAN_HLEN;
> > >
> > > +       reinit_completion(&po->skb_completion);
> > > +
> > >         do {
> > >                 ph = packet_current_frame(po, &po->tx_ring,
> > >                                           TP_STATUS_SEND_REQUEST);
> > >                 if (unlikely(ph == NULL)) {
> > > -                       if (need_wait && need_resched())
> > > -                               schedule();
> > > +                       if (need_wait && skb) {
> > > +                               timeo = sock_sndtimeo(&po->sk, msg->msg_flags & MSG_DONTWAIT);
> > > +                               timeo = wait_for_completion_interruptible_timeout(&po->skb_completion, timeo);
> >
> > This looks really nice.
> >
> > But isn't it still susceptible to the race where tpacket_destruct_skb
> > is called in between po->xmit and this
> > wait_for_completion_interruptible_timeout?
> >
> Thats not an issue, since the complete is only gated on packet_read_pending
> reaching 0 in tpacket_destuct_skb.  Previously it was gated on my
> wait_on_complete flag being non-zero, so we had to set that prior to calling
> po->xmit, or the complete call might never get made, resulting in a hang.  Now,
> we will always call complete, and the completion api allows for arbitrary
> ordering of complete/wait_for_complete (since its internal done variable gets
> incremented), making a call to wait_for_complete effectively a fall through if
> complete gets called first.

Perfect! I was not aware that it handles that internally. Hadn't read
do_wait_for_common closely before.

> There is an odd path here though.  If an application calls sendmsg on a packet
> socket here with MSG_DONTWAIT set, then need_wait will be zero, and we will
> eventually exit this loop without ever having called wait_for_complete, but
> tpacket_destruct_skb will still have called complete when all the frames
> complete transmission.  In and of itself, thats fine, but it leave the
> completion structure in a state where its done variable will have been
> incremented at least once (specifically it will be set to N, where N is the
> number of frames transmitted during the call where MSG_DONTWAIT is set).  If the
> application then calls sendmsg on this socket with MSG_DONTWAIT clear, we will
> call wait_for_complete, but immediately return from it (due to the previously
> made calls to complete).  I've corrected this however, but adding that call to
> reinit_completion prior to the loop entry, so that we are always guaranteed to
> have the completion variable set properly to wait for only the frames being sent
> in this particular instance of the sendmsg call.

Yep, understood.

>
> > The test for skb is shorthand for packet_read_pending  != 0, right?
> >
> Sort of.  gating on skb guarantees for us that we have sent at least one frame
> in this call to tpacket_snd.  If we didn't do that, then it would be possible
> for an application to call sendmsg without setting any frames in the buffer to
> TP_STATUS_SEND_REQUEST, which would cause us to wait for a completion without
> having sent any frames, meaning we would block waiting for an event
> (tpacket_destruct_skb), that will never happen.  The check for skb ensures that
> tpacket_snd_skb will get called, and that we will get a wakeup from a call to
> wait_for_complete.  It does suggest that packet_read_pending != 0, but thats not
> guaranteed, because tpacket_destruct_skb may already have been called (see the
> above explination regarding ordering of complete/wait_for_complete).

But the inverse is true: if gating sleeping on packet_read_pending,
the process only ever waits if a packet is still to be acknowledged.
Then both the wait and wake clearly depend on the same state.

Either way works, I think. So this is definitely fine.

One possible refinement would be to keep po->wait_on_complete (but
rename as po->wake_om_complete), set it before entering the loop and
clear it before function return (both within the pg_vec_lock critical
section). And test that in tpacket_destruct_skb to avoid calling
complete if MSG_DONTWAIT. But I don't think it's worth the complexity.

One rare edge case is a MSG_DONTWAIT send followed by a !MSG_DONTWAIT.
It is then possible for a tpacket_destruct_skb to be run as a result
from the first call, during the second call, after the call to
reinit_completion. That would cause the next wait to return before
*its* packets have been sent. But due to the packet_read_pending test
in the while () condition it will loop again and return to wait. So that's fine.

Thanks for bearing with me.

Reviewed-by: Willem de Bruijn <willemb@google.com>

  reply	other threads:[~2019-06-26 15:06 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-19 20:25 [PATCH net] af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET Neil Horman
2019-06-20 13:41 ` Willem de Bruijn
2019-06-20 14:01   ` Matteo Croce
2019-06-20 14:23   ` Neil Horman
2019-06-20 15:16     ` Willem de Bruijn
2019-06-20 16:14       ` Neil Horman
2019-06-20 16:18         ` Willem de Bruijn
2019-06-20 17:31           ` Neil Horman
2019-06-21 16:41       ` Neil Horman
2019-06-21 18:31         ` Willem de Bruijn
2019-06-21 19:18           ` Neil Horman
2019-06-21 20:06             ` Willem de Bruijn
2019-06-22 11:08               ` Neil Horman
2019-06-22 17:41 ` [PATCH v2 " Neil Horman
2019-06-23  2:12   ` Willem de Bruijn
2019-06-23  2:21     ` Willem de Bruijn
2019-06-23 11:40       ` Neil Horman
2019-06-23 14:39         ` Willem de Bruijn
2019-06-23 19:21           ` Neil Horman
2019-06-23 11:34     ` Neil Horman
2019-06-24  0:46 ` [PATCH v3 " Neil Horman
2019-06-24 18:08   ` Willem de Bruijn
2019-06-24 21:51     ` Neil Horman
2019-06-24 22:15       ` Willem de Bruijn
2019-06-25 11:02         ` Neil Horman
2019-06-25 13:37           ` Willem de Bruijn
2019-06-25 16:20             ` Neil Horman
2019-06-25 21:59               ` Willem de Bruijn
2019-06-25 21:57 ` [PATCH v4 " Neil Horman
2019-06-25 22:30   ` Willem de Bruijn
2019-06-26 10:54     ` Neil Horman
2019-06-26 15:05       ` Willem de Bruijn [this message]
2019-06-26 17:14         ` Neil Horman
2019-06-27  2:38   ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAF=yD-+_khMRCK0gE2q7nAi8fAtwvZ2FerHZKo1U1M-=991+Zg@mail.gmail.com' \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=davem@davemloft.net \
    --cc=mcroce@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=nhorman@tuxdriver.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.