All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Brenden Blanco <bblanco@plumgrid.com>,
	David Miller <davem@davemloft.net>,
	Netdev <netdev@vger.kernel.org>,
	Jamal Hadi Salim <jhs@mojatatu.com>,
	Saeed Mahameed <saeedm@dev.mellanox.co.il>,
	Martin KaFai Lau <kafai@fb.com>, Ari Saha <as754m@att.com>,
	Or Gerlitz <gerlitz.or@gmail.com>,
	john fastabend <john.fastabend@gmail.com>,
	Hannes Frederic Sowa <hannes@stressinduktion.org>,
	Thomas Graf <tgraf@suug.ch>, Tom Herbert <tom@herbertland.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Tariq Toukan <ttoukan.linux@gmail.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	linux-mm <linux-mm@kvack.org>
Subject: Re: order-0 vs order-N driver allocation. Was: [PATCH v10 07/12] net/mlx4_en: add page recycle to prepare rx ring for tx support
Date: Thu, 4 Aug 2016 20:55:36 -0700	[thread overview]
Message-ID: <20160805035534.GA56390@ast-mbp.thefacebook.com> (raw)
In-Reply-To: <CAKgT0UdbVK6Ti9drCQFfa0MyU40Kh=Hu=BtDTRCqqsSiBvJ7rg@mail.gmail.com>

On Thu, Aug 04, 2016 at 05:30:56PM -0700, Alexander Duyck wrote:
> On Thu, Aug 4, 2016 at 9:19 AM, Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> >
> > On Wed, 3 Aug 2016 10:45:13 -0700 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >
> >> On Mon, Jul 25, 2016 at 09:35:20AM +0200, Eric Dumazet wrote:
> >> > On Tue, 2016-07-19 at 12:16 -0700, Brenden Blanco wrote:
> >> > > The mlx4 driver by default allocates order-3 pages for the ring to
> >> > > consume in multiple fragments. When the device has an xdp program, this
> >> > > behavior will prevent tx actions since the page must be re-mapped in
> >> > > TODEVICE mode, which cannot be done if the page is still shared.
> >> > >
> >> > > Start by making the allocator configurable based on whether xdp is
> >> > > running, such that order-0 pages are always used and never shared.
> >> > >
> >> > > Since this will stress the page allocator, add a simple page cache to
> >> > > each rx ring. Pages in the cache are left dma-mapped, and in drop-only
> >> > > stress tests the page allocator is eliminated from the perf report.
> >> > >
> >> > > Note that setting an xdp program will now require the rings to be
> >> > > reconfigured.
> >> >
> >> > Again, this has nothing to do with XDP ?
> >> >
> >> > Please submit a separate patch, switching this driver to order-0
> >> > allocations.
> >> >
> >> > I mentioned this order-3 vs order-0 issue earlier [1], and proposed to
> >> > send a generic patch, but had been traveling lately, and currently in
> >> > vacation.
> >> >
> >> > order-3 pages are problematic when dealing with hostile traffic anyway,
> >> > so we should exclusively use order-0 pages, and page recycling like
> >> > Intel drivers.
> >> >
> >> > http://lists.openwall.net/netdev/2016/04/11/88
> >>
> >> Completely agree. These multi-page tricks work only for benchmarks and
> >> not for production.
> >> Eric, if you can submit that patch for mlx4 that would be awesome.
> >>
> >> I think we should default to order-0 for both mlx4 and mlx5.
> >> Alternatively we're thinking to do a netlink or ethtool switch to
> >> preserve old behavior, but frankly I don't see who needs this order-N
> >> allocation schemes.
> >
> > I actually agree, that we should switch to order-0 allocations.
> >
> > *BUT* this will cause performance regressions on platforms with
> > expensive DMA operations (as they no longer amortize the cost of
> > mapping a larger page).

order-0 is mainly about correctness under memory pressure.
As Eric pointed out order-N is a serious issue for hostile traffic,
but even for normal traffic it's a problem. Sooner or later
only order-0 pages will be available.
Performance considerations come second.

> The trick is to use page reuse like we do for the Intel NICs.  If you
> can get away with just reusing the page you don't have to keep making
> the expensive map/unmap calls.

you mean two packet per page trick?
I think it's trading off performance vs memory.
It's useful. I wish there was a knob to turn it on/off instead
of relying on mtu size threshold.

> > I've started coding on the page-pool last week, which address both the
> > DMA mapping and recycling (with less atomic ops). (p.s. still on
> > vacation this week).
> >
> > http://people.netfilter.org/hawk/presentations/MM-summit2016/generic_page_pool_mm_summit2016.pdf
> 
> I really wonder if we couldn't get away with creating some sort of 2
> tiered allocator for this.  So instead of allocating a page pool we
> just reserved blocks of memory like we do with huge pages.  Then you
> have essentially a huge page that is mapped to a given device for DMA
> and reserved for it to use as a memory resource to allocate the order
> 0 pages out of.  Doing it that way would likely have multiple
> advantages when working with things like IOMMU since the pages would
> all belong to one linear block so it would likely consume less
> resources on those devices, and it wouldn't be that far off from how
> DPDK is making use of huge pages in order to improve it's memory
> access times and such.

interesting idea. Like dma_map 1GB region and then allocate
pages from it only? but the rest of the kernel won't be able
to use them? so only some smaller region then? or it will be
a boot time flag to reserve this pseudo-huge page?
I don't think any of that is needed for XDP. As demonstrated by current
mlx4 it's very fast already. No bottlenecks in page allocators.
Tiny page recycle array does the magic because most of the traffic
is not going to the stack.
This order-0 vs order-N discussion is for the main stack.
Not related to XDP.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-08-05  3:55 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-19 19:16 [PATCH v10 00/12] Add driver bpf hook for early packet drop and forwarding Brenden Blanco
2016-07-19 19:16 ` [PATCH v10 01/12] bpf: add bpf_prog_add api for bulk prog refcnt Brenden Blanco
2016-07-19 21:46   ` Alexei Starovoitov
2016-07-19 19:16 ` [PATCH v10 02/12] bpf: add XDP prog type for early driver filter Brenden Blanco
2016-07-19 21:33   ` Alexei Starovoitov
2016-07-19 19:16 ` [PATCH v10 03/12] net: add ndo to setup/query xdp prog in adapter rx Brenden Blanco
2016-07-19 19:16 ` [PATCH v10 04/12] rtnl: add option for setting link xdp prog Brenden Blanco
2016-07-20  8:38   ` Daniel Borkmann
2016-07-20 17:35     ` Brenden Blanco
2016-07-19 19:16 ` [PATCH v10 05/12] net/mlx4_en: add support for fast rx drop bpf program Brenden Blanco
2016-07-19 21:41   ` Alexei Starovoitov
2016-07-20  9:07   ` Daniel Borkmann
2016-07-20 17:33     ` Brenden Blanco
2016-07-24 11:56   ` Jesper Dangaard Brouer
2016-07-24 16:57   ` Tom Herbert
2016-07-24 20:34     ` Daniel Borkmann
2016-07-19 19:16 ` [PATCH v10 06/12] Add sample for adding simple drop program to link Brenden Blanco
2016-07-19 21:44   ` Alexei Starovoitov
2016-07-19 19:16 ` [PATCH v10 07/12] net/mlx4_en: add page recycle to prepare rx ring for tx support Brenden Blanco
2016-07-19 21:49   ` Alexei Starovoitov
2016-07-25  7:35   ` Eric Dumazet
2016-08-03 17:45     ` order-0 vs order-N driver allocation. Was: " Alexei Starovoitov
2016-08-04 16:19       ` Jesper Dangaard Brouer
2016-08-05  0:30         ` Alexander Duyck
2016-08-05  3:55           ` Alexei Starovoitov [this message]
2016-08-05 15:15             ` Alexander Duyck
2016-08-05 15:33               ` David Laight
2016-08-05 15:33                 ` David Laight
2016-08-05 16:00                 ` Alexander Duyck
2016-08-05 16:00                   ` Alexander Duyck
2016-08-05  7:15         ` Eric Dumazet
2016-08-05  7:15           ` Eric Dumazet
2016-08-08  2:15           ` Alexei Starovoitov
2016-08-08  2:15             ` Alexei Starovoitov
2016-08-08  8:01             ` Jesper Dangaard Brouer
2016-08-08 18:34               ` Alexei Starovoitov
2016-08-09 12:14                 ` Jesper Dangaard Brouer
2016-07-19 19:16 ` [PATCH v10 08/12] bpf: add XDP_TX xdp_action for direct forwarding Brenden Blanco
2016-07-19 21:53   ` Alexei Starovoitov
2016-07-19 19:16 ` [PATCH v10 09/12] net/mlx4_en: break out tx_desc write into separate function Brenden Blanco
2016-07-19 19:16 ` [PATCH v10 10/12] net/mlx4_en: add xdp forwarding and data write support Brenden Blanco
2016-07-19 19:16 ` [PATCH v10 11/12] bpf: enable direct packet data write for xdp progs Brenden Blanco
2016-07-19 21:59   ` Alexei Starovoitov
2016-07-19 19:16 ` [PATCH v10 12/12] bpf: add sample for xdp forwarding and rewrite Brenden Blanco
2016-07-19 22:05   ` Alexei Starovoitov
2016-07-20 17:38     ` Brenden Blanco
2016-07-27 18:25     ` Jesper Dangaard Brouer
2016-08-03 17:01   ` Tom Herbert
2016-08-03 17:11     ` Alexei Starovoitov
2016-08-03 17:29       ` Tom Herbert
2016-08-03 18:29         ` David Miller
2016-08-03 18:29         ` Brenden Blanco
2016-08-03 18:31           ` David Miller
2016-08-03 19:06           ` Tom Herbert
2016-08-03 22:36             ` Alexei Starovoitov
2016-08-03 23:18               ` Daniel Borkmann
2016-07-20  5:09 ` [PATCH v10 00/12] Add driver bpf hook for early packet drop and forwarding David Miller
     [not found]   ` <6a09ce5d-f902-a576-e44e-8e1e111ae26b@gmail.com>
2016-07-20 14:08     ` Brenden Blanco
2016-07-20 19:14     ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160805035534.GA56390@ast-mbp.thefacebook.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=alexander.duyck@gmail.com \
    --cc=as754m@att.com \
    --cc=bblanco@plumgrid.com \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=gerlitz.or@gmail.com \
    --cc=hannes@stressinduktion.org \
    --cc=jhs@mojatatu.com \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@dev.mellanox.co.il \
    --cc=tgraf@suug.ch \
    --cc=tom@herbertland.com \
    --cc=ttoukan.linux@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.