All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Network Development <netdev@vger.kernel.org>,
	David Miller <davem@davemloft.net>
Subject: Re: [PATCH net] net/packet: tpacket_rcv: do not increment ring index on drop
Date: Tue, 10 Mar 2020 17:57:55 -0400	[thread overview]
Message-ID: <20200310175627-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CA+FuTSfrjThis9UchhiKE2ibMKVgCvfTdbeB0Q33XiTDLBEX8w@mail.gmail.com>

On Tue, Mar 10, 2020 at 05:35:55PM -0400, Willem de Bruijn wrote:
> On Tue, Mar 10, 2020 at 5:30 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Mar 10, 2020 at 11:38:16AM -0400, Willem de Bruijn wrote:
> > > On Tue, Mar 10, 2020 at 10:44 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Tue, Mar 10, 2020 at 10:16:56AM -0400, Willem de Bruijn wrote:
> > > > > On Tue, Mar 10, 2020 at 8:59 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Tue, Mar 10, 2020 at 08:49:23AM -0400, Willem de Bruijn wrote:
> > > > > > > On Tue, Mar 10, 2020 at 2:43 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Mar 09, 2020 at 11:34:35AM -0400, Willem de Bruijn wrote:
> > > > > > > > > From: Willem de Bruijn <willemb@google.com>
> > > > > > > > >
> > > > > > > > > In one error case, tpacket_rcv drops packets after incrementing the
> > > > > > > > > ring producer index.
> > > > > > > > >
> > > > > > > > > If this happens, it does not update tp_status to TP_STATUS_USER and
> > > > > > > > > thus the reader is stalled for an iteration of the ring, causing out
> > > > > > > > > of order arrival.
> > > > > > > > >
> > > > > > > > > The only such error path is when virtio_net_hdr_from_skb fails due
> > > > > > > > > to encountering an unknown GSO type.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > > > > > >
> > > > > > > > > ---
> > > > > > > > >
> > > > > > > > > I wonder whether it should drop packets with unknown GSO types at all.
> > > > > > > > > This consistently blinds the reader to certain packets, including
> > > > > > > > > recent UDP and SCTP GSO types.
> > > > > > > >
> > > > > > > > Ugh it looks like you have found a bug.  Consider a legacy userspace -
> > > > > > > > it was actually broken by adding USD and SCTP GSO.  I suspect the right
> > > > > > > > thing to do here is actually to split these packets up, not drop them.
> > > > > > >
> > > > > > > In the main virtio users, virtio_net/tun/tap, the packets will always
> > > > > > > arrive segmented, due to these devices not advertising hardware
> > > > > > > segmentation for these protocols.
> > > > > >
> > > > > > Oh right. That's good then, sorry about the noise.
> > > > >
> > > > > Not at all. Thanks for taking a look!
> > > > >
> > > > > > > So the issue is limited to users of tpacket_rcv, which is relatively
> > > > > > > new. There too it is limited on egress to devices that do advertise
> > > > > > > h/w offload. And on r/x to GRO.
> > > > > > >
> > > > > > > The UDP GSO issue precedes the fraglist GRO patch, by the way, and
> > > > > > > goes back to my (argh!) introduction of the feature on the egress
> > > > > > > path.
> > > > > > >
> > > > > > > >
> > > > > > > > > The peer function virtio_net_hdr_to_skb already drops any packets with
> > > > > > > > > unknown types, so it should be fine to add an SKB_GSO_UNKNOWN type and
> > > > > > > > > let the peer at least be aware of failure.
> > > > > > > > >
> > > > > > > > > And possibly add SKB_GSO_UDP_L4 and SKB_GSO_SCTP types to virtio too.
> > > > > > > >
> > > > > > > > This last one is possible for sure, but for virtio_net_hdr_from_skb
> > > > > > > > we'll need more flags to know whether it's safe to pass
> > > > > > > > these types to userspace.
> > > > > > >
> > > > > > > Can you elaborate? Since virtio_net_hdr_to_skb users already returns
> > > > > > > -EINVAL on unknown GSO types and its callers just drop these packets,
> > > > > > > it looks to me that the infra is future proof wrt adding new GSO
> > > > > > > types.
> > > > > >
> > > > > > Oh I mean if we do want to add new types and want to pass them to
> > > > > > users, then virtio_net_hdr_from_skb will need to flag so it
> > > > > > knows whether that will or won't confuse userspace.
> > > > >
> > > > > I'm not sure how that would work. Ignoring other tun/tap/virtio for
> > > > > now, just looking at tpacket, a new variant of socket option for
> > > > > PACKET_VNET_HDR, for every new GSO type?
> > > >
> > > > Maybe a single one with a bitmap of legal types?
> > > >
> > > > > In practice the userspace I'm aware of, and any sane implementation,
> > > > > will be future proof to drop and account packets whose type it cannot
> > > > > process. So I think we can just add new types.
> > > >
> > > > Well if packets are just dropped then userspace breaks right?
> > >
> > > It is an improvement over the current silent discard in the kernel.
> > >
> > > If it can count these packets, userspace becomes notified that it
> > > should perhaps upgrade or use ethtool to stop the kernel from
> > > generating certain packets.
> > >
> > > Specifically for packet sockets, it wants to receive packets as they
> > > appear "on the wire". It does not have to drop these today even, but
> > > can easily parse the headers.
> > >
> > > For packet sockets at least, I don't think that we want transparent
> > > segmentation.
> >
> > Well it's GSO is in the way then it's no longer "on the wire", right?
> > Whether we split these back to individual skbs or we don't
> > it's individual packets that are on the wire. GSO just allows
> > passing them to the application in a more efficient way.
> 
> Not entirely. With TSO enabled, packet sockets will show the TCP TSO
> packets, not the individual segment on the wire.

But nothing breaks if it shows a segment on the wire while linux
processes packets in batches, right? It's just some extra info that
an app can't handle, so we hide it from the app...

-- 
MST


  reply	other threads:[~2020-03-10 21:58 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-09 15:34 [PATCH net] net/packet: tpacket_rcv: do not increment ring index on drop Willem de Bruijn
2020-03-09 15:42 ` Willem de Bruijn
2020-03-09 15:50   ` Willem de Bruijn
2020-03-10  6:46     ` Michael S. Tsirkin
2020-03-10  6:43 ` Michael S. Tsirkin
2020-03-10 12:49   ` Willem de Bruijn
2020-03-10 12:59     ` Michael S. Tsirkin
2020-03-10 14:16       ` Willem de Bruijn
2020-03-10 14:43         ` Michael S. Tsirkin
2020-03-10 15:38           ` Willem de Bruijn
2020-03-10 16:14             ` Willem de Bruijn
2020-03-10 21:29             ` Michael S. Tsirkin
2020-03-10 21:35               ` Willem de Bruijn
2020-03-10 21:57                 ` Michael S. Tsirkin [this message]
2020-03-10 23:13                   ` Willem de Bruijn
2020-03-11  7:56                     ` Michael S. Tsirkin
2020-03-11 14:31                       ` Willem de Bruijn
2020-03-11 21:25                         ` Michael S. Tsirkin
2020-03-11 21:49                           ` Willem de Bruijn
2020-03-12  6:13 ` David Miller
2020-03-12  6:27   ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200310175627-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.