All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Brenden Blanco <bblanco@plumgrid.com>,
	davem@davemloft.net, netdev@vger.kernel.org, tom@herbertland.com,
	ogerlitz@mellanox.com, daniel@iogearbox.net,
	john.fastabend@gmail.com, brouer@redhat.com
Subject: Re: [RFC PATCH 4/5] mlx4: add support for fast rx drop bpf program
Date: Fri, 1 Apr 2016 19:47:12 -0700	[thread overview]
Message-ID: <20160402024710.GA59703@ast-mbp.thefacebook.com> (raw)
In-Reply-To: <1459562911.6473.299.camel@edumazet-glaptop3.roam.corp.google.com>

On Fri, Apr 01, 2016 at 07:08:31PM -0700, Eric Dumazet wrote:
> On Fri, 2016-04-01 at 18:21 -0700, Brenden Blanco wrote:
> > Add support for the BPF_PROG_TYPE_PHYS_DEV hook in mlx4 driver.  Since
> > bpf programs require a skb context to navigate the packet, build a
> > percpu fake skb with the minimal fields. This avoids the costly
> > allocation for packets that end up being dropped.
> > 
> 
> 
> > +		/* A bpf program gets first chance to drop the packet. It may
> > +		 * read bytes but not past the end of the frag. A non-zero
> > +		 * return indicates packet should be dropped.
> > +		 */
> > +		if (prog) {
> > +			struct ethhdr *ethh;
> > +
> > +			ethh = (struct ethhdr *)(page_address(frags[0].page) +
> > +						 frags[0].page_offset);
> > +			if (mlx4_call_bpf(prog, ethh, length)) {
> > +				priv->stats.rx_dropped++;
> > +				goto next;
> > +			}
> > +		}
> > +
> 
> 
> 1) mlx4 can use multiple fragments (priv->num_frags) to hold an Ethernet
> frame. 
> 
> Still you pass a single fragment but total 'length' here : BPF program
> can read past the end of this first fragment and panic the box.
> 
> Please take a look at mlx4_en_complete_rx_desc() and you'll see what I
> mean.

yep.
my reading of that part was that num_frags > 1 is only for large
mtu sizes, so if we limit this for num_frags==1 only for now
we should be ok and it's still applicable for most of the use cases ?

> 2) priv->stats.rx_dropped is shared by all the RX queues -> false
> sharing.

yes. good point. I bet it was copy pasted from few lines below.
Should be trivial to convert it to percpu.

>    This is probably the right time to add a rx_dropped field in struct
> mlx4_en_rx_ring since you guys want to drop 14 Mpps, and 50 Mpps on
> higher speed links.

yes, could be per ring as well.
My guess we're hitting 14.5Mpps limit for empty bpf program
and for program that actually looks into the packet because we're
hitting 10G phy limit of 40G nic. Since physically 40G nic
consists of four 10G phys. There will be the same problem
with 100G and 50G nics. Both will be hitting 25G phy limit.
We need to vary packets somehow. Hopefully Or can explain that
bit of hw design.
Jesper's experiments with mlx4 showed the same 14.5Mpps limit
when sender blasting the same packet over and over again.
Great to see the experiments converging.

  reply	other threads:[~2016-04-02  2:47 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-02  1:21 [RFC PATCH 0/5] Add driver bpf hook for early packet drop Brenden Blanco
2016-04-02  1:21 ` [RFC PATCH 1/5] bpf: add PHYS_DEV prog type for early driver filter Brenden Blanco
2016-04-02 16:39   ` Tom Herbert
2016-04-03  7:02     ` Brenden Blanco
2016-04-04 22:07       ` Thomas Graf
2016-04-05  8:19         ` Jesper Dangaard Brouer
2016-04-04  8:49   ` Daniel Borkmann
2016-04-04 13:07     ` Jesper Dangaard Brouer
2016-04-04 13:36       ` Daniel Borkmann
2016-04-04 14:09         ` Tom Herbert
2016-04-04 15:12           ` Jesper Dangaard Brouer
2016-04-04 15:29             ` Brenden Blanco
2016-04-04 16:07               ` John Fastabend
2016-04-04 16:17                 ` Brenden Blanco
2016-04-04 20:00                   ` Alexei Starovoitov
2016-04-04 22:04                     ` Thomas Graf
2016-04-05  2:25                       ` Alexei Starovoitov
2016-04-05  8:11                         ` Jesper Dangaard Brouer
2016-04-05  9:29                     ` Jesper Dangaard Brouer
2016-04-05 22:06                       ` Alexei Starovoitov
2016-04-04 14:33       ` Eric Dumazet
2016-04-04 15:18         ` Edward Cree
2016-04-02  1:21 ` [RFC PATCH 2/5] net: add ndo to set bpf prog in adapter rx Brenden Blanco
2016-04-02  1:21 ` [RFC PATCH 3/5] rtnl: add option for setting link bpf prog Brenden Blanco
2016-04-02  1:21 ` [RFC PATCH 4/5] mlx4: add support for fast rx drop bpf program Brenden Blanco
2016-04-02  2:08   ` Eric Dumazet
2016-04-02  2:47     ` Alexei Starovoitov [this message]
2016-04-04 14:57       ` Jesper Dangaard Brouer
2016-04-04 15:22         ` Eric Dumazet
2016-04-04 18:50           ` Alexei Starovoitov
2016-04-05 14:15             ` Or Gerlitz
2016-04-06  4:05               ` Brenden Blanco
2016-04-03  6:15     ` Brenden Blanco
2016-04-05  2:20       ` Brenden Blanco
2016-04-05  2:44         ` Eric Dumazet
2016-04-05 18:59         ` Eran Ben Elisha
2016-04-02  8:23   ` Jesper Dangaard Brouer
2016-04-03  6:11     ` Brenden Blanco
2016-04-04 18:27       ` Alexei Starovoitov
2016-04-05  6:04         ` Jesper Dangaard Brouer
2016-04-02 18:40   ` Johannes Berg
2016-04-03  6:38     ` Brenden Blanco
2016-04-04  7:35       ` Johannes Berg
2016-04-04  9:57         ` Daniel Borkmann
2016-04-04 18:46           ` Alexei Starovoitov
2016-04-04 21:01             ` Daniel Borkmann
2016-04-05  1:17               ` Alexei Starovoitov
2016-04-04  8:33   ` Jesper Dangaard Brouer
2016-04-04  9:22   ` Daniel Borkmann
2016-04-02  1:21 ` [RFC PATCH 5/5] Add sample for adding simple drop program to link Brenden Blanco
2016-04-06 19:48   ` Jesper Dangaard Brouer
2016-04-06 20:01     ` Jesper Dangaard Brouer
2016-04-06 23:11       ` Alexei Starovoitov
2016-04-06 20:03     ` Daniel Borkmann
2016-04-02 16:47 ` [RFC PATCH 0/5] Add driver bpf hook for early packet drop Tom Herbert
2016-04-03  5:41   ` Brenden Blanco
2016-04-04  7:48     ` Jesper Dangaard Brouer
2016-04-04 18:10       ` Alexei Starovoitov
2016-04-02 18:41 ` Johannes Berg
2016-04-02 22:57   ` Tom Herbert
2016-04-03  2:28     ` Lorenzo Colitti
2016-04-04  7:37       ` Johannes Berg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160402024710.GA59703@ast-mbp.thefacebook.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=bblanco@plumgrid.com \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=john.fastabend@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=ogerlitz@mellanox.com \
    --cc=tom@herbertland.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.