linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Jon Rosen (jrosen)" <jrosen@cisco.com>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	Willem de Bruijn <willemb@google.com>,
	Eric Dumazet <edumazet@google.com>,
	Kees Cook <keescook@chromium.org>,
	David Windsor <dwindsor@gmail.com>,
	"Rosen, Rami" <rami.rosen@intel.com>,
	"Reshetova, Elena" <elena.reshetova@intel.com>,
	"Mike Maloney" <maloney@google.com>,
	Benjamin Poirier <bpoirier@suse.com>,
	"open list:NETWORKING [GENERAL]" <netdev@vger.kernel.org>,
	open list <linux-kernel@vger.kernel.org>
Subject: RE: [RFC PATCH] packet: mark ring entry as in-use inside spin_lock to prevent RX ring overrun
Date: Wed, 4 Apr 2018 14:59:48 +0000	[thread overview]
Message-ID: <057239cdf1c34bc69636009c9e099214@XCH-RTP-016.cisco.com> (raw)
In-Reply-To: <CAF=yD-L8DaS7LW4SAaTRd2RPxVnfrdiVLCdWJxxaTu3Hn8ORSA@mail.gmail.com>

On Wednesday, April 04, 2018 9:49 AM, Willem de Bruijn <willemb@google.com> wrote:
> 
> On Tue, Apr 3, 2018 at 11:55 PM, Jon Rosen <jrosen@cisco.com> wrote:
> > Fix PACKET_RX_RING bug for versions TPACKET_V1 and TPACKET_V2 which
> > casues the ring to get corrupted by allowing multiple kernel threads
> > to claim ownership of the same ring entry, Mark the ring entry as
> > already being used within the spin_lock to prevent other kernel
> > threads from reusing the same entry before it's fully filled in,
> > passed to user space, and then eventually passed back to the kernel
> > for use with a new packet.
> >
> > Note that the proposed change may modify the semantics of the
> > interface between kernel space and user space in a way which may cause
> > some applications to no longer work properly.
> 
> As long as TP_STATUS_USER (1) is not set, userspace should ignore
> the slot..
> 
> >    One issue with the above proposed change to use TP_STATUS_IN_PROGRESS
> >    is that the documentation of the tp_status field is somewhat
> >    inconsistent.  In some places it's described as TP_STATUS_KERNEL(0)
> >    meaning the entry is owned by the kernel and !TP_STATUS_KERNEL(0)
> >    meaning the entry is owned by user space.  In other places ownership
> >    by user space is defined by the TP_STATUS_USER(1) bit being set.
> 
> But indeed this example in packet_mmap.txt is problematic
> 
>     if (status == TP_STATUS_KERNEL)
>         retval = poll(&pfd, 1, timeout);
> 
> It does not really matter whether the docs are possibly inconsistent and
> which one is authoritative. Examples like the above make it likely that
> some user code expects such code to work.

Yes, that's exactly my concern.  Yet another troubling example seems to be
lipbcap which also is looking specifically for status to be anything other than
TP_STATUS_KERNEL(0) to indicate a frame is available in user space.

Either way things are broken. They are broken as they stand now because the
ring can get overrun and the kernel and user space tracking of the ring can
get out of sync.  And they are broken with the below change because some user
space applications will be looking for anything other than TP_STATUS_KERNEL,
so again the ring will get out of sync.

The difference here being that the way it is today is on average (across all environments
and across all user space apps) less likely to occur while with the change below it is
much more likely to occur.

Maybe the right answer here is to implement a fix that is compatible for existing
applications and accept any potential performance impacts and then add yet another
version (TPACKET_V4?) which more strictly requires the TP_STATUS_USER bit for
passing ownership.

> 
> > +++ b/net/packet/af_packet.c
> > @@ -2287,6 +2287,15 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
> >                 if (po->stats.stats1.tp_drops)
> >                         status |= TP_STATUS_LOSING;
> >         }
> > +
> > +        /*
> > +         * Mark this entry as TP_STATUS_IN_PROGRESS to prevent other
> > +         * kernel threads from re-using this same entry.
> > +         */
> > +#define TP_STATUS_IN_PROGRESS TP_STATUS_LOSING
> 
> No need to reinterpret existing flags. tp_status is a u32 with
> sufficient undefined bits.

Agreed.

> 
> > +       if (po->tp_version <= TPACKET_V2)
> > +            __packet_set_status(po, h.raw, TP_STATUS_IN_PROGRESS);
> > +
> >         po->stats.stats1.tp_packets++;
> >         if (copy_skb) {
> >                 status |= TP_STATUS_COPY;
> > --
> > 2.10.3.dirty
> >

Thanks for the feedback!
Jon.

  reply	other threads:[~2018-04-04 14:59 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-03 21:55 [RFC PATCH] packet: mark ring entry as in-use inside spin_lock to prevent RX ring overrun Jon Rosen
2018-04-03 23:16 ` Stephen Hemminger
2018-04-04 14:34   ` Jon Rosen (jrosen)
2018-04-04 13:49 ` Willem de Bruijn
2018-04-04 14:59   ` Jon Rosen (jrosen) [this message]
2018-04-04 21:44     ` Willem de Bruijn
2018-04-04 22:31       ` Jon Rosen (jrosen)
2018-05-19 12:25         ` Jon Rosen (jrosen)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=057239cdf1c34bc69636009c9e099214@XCH-RTP-016.cisco.com \
    --to=jrosen@cisco.com \
    --cc=bpoirier@suse.com \
    --cc=davem@davemloft.net \
    --cc=dwindsor@gmail.com \
    --cc=edumazet@google.com \
    --cc=elena.reshetova@intel.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maloney@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=rami.rosen@intel.com \
    --cc=willemb@google.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).