All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Tom Herbert <tom@herbertland.com>
Cc: Daniel Borkmann <borkmann@iogearbox.net>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	brouer@redhat.com
Subject: Re: [RFC net-next PATCH 4/5] net: new XDP feature for reading HW rxhash from drivers
Date: Mon, 22 May 2017 22:42:29 +0200	[thread overview]
Message-ID: <20170522224229.540ce059@redhat.com> (raw)
In-Reply-To: <20170522083935.4d82174f@redhat.com>

On Mon, 22 May 2017 08:39:35 +0200
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> On Sun, 21 May 2017 15:10:29 -0700
> Tom Herbert <tom@herbertland.com> wrote:
> 
> > On Sun, May 21, 2017 at 9:04 AM, Jesper Dangaard Brouer
> > <brouer@redhat.com> wrote:  
> > > On Sat, 20 May 2017 09:16:09 -0700
> > > Tom Herbert <tom@herbertland.com> wrote:
> > >    
> > >> > +/* XDP rxhash have an associated type, which is related to the RSS
> > >> > + * (Receive Side Scaling) standard, but NIC HW have different mapping
> > >> > + * and support. Thus, create mapping that is interesting for XDP.  XDP
> > >> > + * would primarly want insight into L3 and L4 protocol info.
> > >> > + *
> > >> > + * TODO: Likely need to get extended with "L3_IPV6_EX" due RSS standard
> > >> > + *
> > >> > + * The HASH_TYPE will be returned from bpf helper as the top 32-bit of
> > >> > + * the 64-bit rxhash (internally type stored in xdp_buff->flags).
> > >> > + */
> > >> > +#define XDP_HASH(x)            ((x) & ((1ULL << 32)-1))
> > >> > +#define XDP_HASH_TYPE(x)       ((x) >> 32)
> > >> > +
> > >> > +#define XDP_HASH_TYPE_L3_SHIFT 0
> > >> > +#define XDP_HASH_TYPE_L3_BITS  3
> > >> > +#define XDP_HASH_TYPE_L3_MASK  ((1ULL << XDP_HASH_TYPE_L3_BITS)-1)
> > >> > +#define XDP_HASH_TYPE_L3(x)    ((x) & XDP_HASH_TYPE_L3_MASK)
> > >> > +enum {
> > >> > +       XDP_HASH_TYPE_L3_IPV4 = 1,
> > >> > +       XDP_HASH_TYPE_L3_IPV6,
> > >> > +};
> > >> > +
> > >> > +#define XDP_HASH_TYPE_L4_SHIFT XDP_HASH_TYPE_L3_BITS
> > >> > +#define XDP_HASH_TYPE_L4_BITS  5
> > >> > +#define XDP_HASH_TYPE_L4_MASK                                          \
> > >> > +       (((1ULL << XDP_HASH_TYPE_L4_BITS)-1) << XDP_HASH_TYPE_L4_SHIFT)
> > >> > +#define XDP_HASH_TYPE_L4(x)    ((x) & XDP_HASH_TYPE_L4_MASK)
> > >> > +enum {
> > >> > +       _XDP_HASH_TYPE_L4_TCP = 1,
> > >> > +       _XDP_HASH_TYPE_L4_UDP,
> > >> > +};
> > >> > +#define XDP_HASH_TYPE_L4_TCP (_XDP_HASH_TYPE_L4_TCP << XDP_HASH_TYPE_L4_SHIFT)
> > >> > +#define XDP_HASH_TYPE_L4_UDP (_XDP_HASH_TYPE_L4_UDP << XDP_HASH_TYPE_L4_SHIFT)
> > >> > +    
> > >> Hi Jesper,
> > >>
> > >> Why do we need these indicators for protocol specific hash? It seems
> > >> like L4 and L3 is useful differentiation and protocol agnostic (I'm
> > >> still holding out hope that SCTP will be deployed some day ;-) )    
> > >
> > > I'm not sure I understood the question fully, but let me try to answer
> > > anyway.  To me it seems obvious that you would want to know the
> > > protocol/L4 type, as this helps avoid hash collisions between UDP and
> > > TCP flows.  I can easily imagine someone constructing an UDP packet
> > > that could hash collide with a given TCP flow.
> > >
> > > And yes, i40 support matching SCTP, and we will create a
> > > XDP_HASH_TYPE_L4_SCTP when adding XDP rxhash support for that driver.
> > >    
> > But where would this information be used? We don't save it in skbuff,
> > don't use it in RPS, RFS. RSS doesn't use it for packet steering so
> > the hash collision problem already exists at the device level. If
> > there is a collision problem between two protocols then maybe hash
> > should be over 5-tuple instead...  
> 
> One use-case (I heard at a customer) was that they had (web-)servers
> that didn't serve any UDP traffic, thus they simply block/drop all
> incoming UDP on the service NIC (as an ACL in the switch). (The servers
> own DNS lookups and NTP goes through the management NIC to internal
> DNS/NTP servers).
> 
> Another use-case: Inside an XDP/bpf program is can be used for
> splitting protocol processing, into different tail calls, before even
> touching packet-data.  I can imagine the bpf TCP handling code is
> larger, thus an optimization is to have a separate tail call for the
> UDP protocol handling.  One could also transfer/queue all TCP traffic
> to other CPU(s) like RPS, just without touching packet memory.
> 
> 
> This info is saved in the skb, but due to space constrains, it is
> reduced to a single bit, namely skb->l4_hash, iif some
> RSS-proto/XDP_HASH_TYPE_L4_* bit was set.  And the network stack do use
> and react on this.
 
I also want to mention another real-customer use-case.  Some
deployments have a VXLAN based networks, but some NICs cannot
understand VXLAN do cannot do proper RSS rx-hashing, which resulted in
bad CPU scaling as all VXLAN packets gets delivered to the same CPU.

Thus, I would like to implement recalculation of the RXHASH in XDP, as
that would save me implementing yet another extension to the flow
dissector, that the kernel would have to carry forever, while this is
just a matter of NIC hashing getting improved.

With the extra L3 and L4 info, I'm assuming that XDP_HASH_TYPE_L3(x)
and XDP_HASH_TYPE_L4(x) will be zero for VXLAN as the NIC cannot
identify this.  Thus, I can at an early stage know which packets needs
to get a new rxhash.

I've seen a similar problem with Q-in-Q double tagged VLANs, failing
the RSS-hash distribution the same way...

I hope that explains what this can be use for(?)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

  reply	other threads:[~2017-05-22 20:42 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-18 15:41 [RFC net-next PATCH 0/5] XDP driver feature API and handling change to xdp_buff Jesper Dangaard Brouer
2017-05-18 15:41 ` [RFC net-next PATCH 1/5] samples/bpf: xdp_tx_iptunnel make use of map_data[] Jesper Dangaard Brouer
2017-05-19 15:45   ` Daniel Borkmann
2017-05-18 15:41 ` [RFC net-next PATCH 2/5] mlx5: fix bug reading rss_hash_type from CQE Jesper Dangaard Brouer
2017-05-19 15:47   ` Daniel Borkmann
2017-05-19 23:38   ` David Miller
2017-05-22 18:27     ` Jesper Dangaard Brouer
2017-05-18 15:41 ` [RFC net-next PATCH 3/5] net: introduce XDP driver features interface Jesper Dangaard Brouer
2017-05-19 17:13   ` Daniel Borkmann
2017-05-19 23:37     ` David Miller
2017-05-20  7:53     ` Jesper Dangaard Brouer
2017-05-21  0:58       ` Daniel Borkmann
2017-05-22 14:49         ` Jesper Dangaard Brouer
2017-05-22 17:07           ` Daniel Borkmann
2017-05-30  9:58             ` Jesper Dangaard Brouer
2017-05-18 15:41 ` [RFC net-next PATCH 4/5] net: new XDP feature for reading HW rxhash from drivers Jesper Dangaard Brouer
2017-05-19 11:47   ` Jesper Dangaard Brouer
2017-05-20  3:07   ` Alexei Starovoitov
2017-05-20  3:21     ` Jakub Kicinski
2017-05-20  3:34       ` Alexei Starovoitov
2017-05-20  4:13         ` Jakub Kicinski
2017-05-21 15:55     ` Jesper Dangaard Brouer
2017-05-22  3:21       ` Alexei Starovoitov
2017-05-22  4:12         ` John Fastabend
2017-05-20 16:16   ` Tom Herbert
2017-05-21 16:04     ` Jesper Dangaard Brouer
2017-05-21 22:10       ` Tom Herbert
2017-05-22  6:39         ` Jesper Dangaard Brouer
2017-05-22 20:42           ` Jesper Dangaard Brouer [this message]
2017-05-22 21:32             ` Tom Herbert
2017-05-18 15:41 ` [RFC net-next PATCH 5/5] mlx5: add XDP rxhash feature for driver mlx5 Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170522224229.540ce059@redhat.com \
    --to=brouer@redhat.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=borkmann@iogearbox.net \
    --cc=netdev@vger.kernel.org \
    --cc=tom@herbertland.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.