All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Björn Töpel" <bjorn.topel@gmail.com>
To: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: ast@kernel.org, "Daniel Borkmann" <daniel@iogearbox.net>,
	Netdev <netdev@vger.kernel.org>,
	"Jeff Kirsher" <jeffrey.t.kirsher@intel.com>,
	intel-wired-lan <intel-wired-lan@lists.osuosl.org>,
	"Björn Töpel" <bjorn.topel@intel.com>,
	"Karlsson, Magnus" <magnus.karlsson@intel.com>,
	"Magnus Karlsson" <magnus.karlsson@gmail.com>
Subject: Re: [PATCH bpf-next 0/4] i40e AF_XDP zero-copy buffer leak fixes
Date: Wed, 5 Sep 2018 21:15:14 +0200	[thread overview]
Message-ID: <CAJ+HfNjQCP7c7gCVJi=LRxK2LvQrNvn6cPfRACW6TPzM8339SA@mail.gmail.com> (raw)
In-Reply-To: <20180905191437.35f7d049@cakuba>

Den ons 5 sep. 2018 kl 19:14 skrev Jakub Kicinski
<jakub.kicinski@netronome.com>:
>
> On Tue,  4 Sep 2018 20:11:01 +0200, Björn Töpel wrote:
> > From: Björn Töpel <bjorn.topel@intel.com>
> >
> > This series addresses an AF_XDP zero-copy issue that buffers passed
> > from userspace to the kernel was leaked when the hardware descriptor
> > ring was torn down.
> >
> > The patches fixes the i40e AF_XDP zero-copy implementation.
> >
> > Thanks to Jakub Kicinski for pointing this out!
> >
> > Some background for folks that don't know the details: A zero-copy
> > capable driver picks buffers off the fill ring and places them on the
> > hardware Rx ring to be completed at a later point when DMA is
> > complete. Similar on the Tx side; The driver picks buffers off the Tx
> > ring and places them on the Tx hardware ring.
> >
> > In the typical flow, the Rx buffer will be placed onto an Rx ring
> > (completed to the user), and the Tx buffer will be placed on the
> > completion ring to notify the user that the transfer is done.
> >
> > However, if the driver needs to tear down the hardware rings for some
> > reason (interface goes down, reconfiguration and such), the userspace
> > buffers cannot be leaked. They have to be reused or completed back to
> > userspace.
> >
> > The implementation does the following:
> >
> > * Outstanding Tx descriptors will be passed to the completion
> >   ring. The Tx code has back-pressure mechanism in place, so that
> >   enough empty space in the completion ring is guaranteed.
> >
> > * Outstanding Rx descriptors are temporarily stored on a stash/reuse
> >   queue. The reuse queue is based on Jakub's RFC. When/if the HW rings
> >   comes up again, entries from the stash are used to re-populate the
> >   ring.
> >
> > * When AF_XDP ZC is enabled, disallow changing the number of hardware
> >   descriptors via ethtool. Otherwise, the size of the stash/reuse
> >   queue can grow unbounded.
> >
> > Going forward, introducing a "zero-copy allocator" analogous to Jesper
> > Brouer's page pool would be a more robust and reuseable solution.
> >
> > Jakub: I've made a minor checkpatch-fix to your RFC, prior adding it
> > into this series.
>
> Thanks for the fix! :)
>
> Out of curiosity, did checking the reuse queue have a noticeable impact
> in your test (i.e. always using the _rq() helpers)?  You seem to be
> adding an indirect call, would that not be way worse on a retpoline
> kernel?

Do you mean the indirection in __i40e_alloc_rx_buffers_zc (patch #3)?
The indirect call is elided by the __always_inline -- without that
retpoline took 2.5Mpps worth of Rx. :-(

I'm only using the _rq helpers in the configuration/slow path, so the
fast-path is unchanged.


Björn

WARNING: multiple messages have this Message-ID (diff)
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= <bjorn.topel@gmail.com>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] [PATCH bpf-next 0/4] i40e AF_XDP zero-copy buffer leak fixes
Date: Wed, 5 Sep 2018 21:15:14 +0200	[thread overview]
Message-ID: <CAJ+HfNjQCP7c7gCVJi=LRxK2LvQrNvn6cPfRACW6TPzM8339SA@mail.gmail.com> (raw)
In-Reply-To: <20180905191437.35f7d049@cakuba>

Den ons 5 sep. 2018 kl 19:14 skrev Jakub Kicinski
<jakub.kicinski@netronome.com>:
>
> On Tue,  4 Sep 2018 20:11:01 +0200, Bj?rn T?pel wrote:
> > From: Bj?rn T?pel <bjorn.topel@intel.com>
> >
> > This series addresses an AF_XDP zero-copy issue that buffers passed
> > from userspace to the kernel was leaked when the hardware descriptor
> > ring was torn down.
> >
> > The patches fixes the i40e AF_XDP zero-copy implementation.
> >
> > Thanks to Jakub Kicinski for pointing this out!
> >
> > Some background for folks that don't know the details: A zero-copy
> > capable driver picks buffers off the fill ring and places them on the
> > hardware Rx ring to be completed at a later point when DMA is
> > complete. Similar on the Tx side; The driver picks buffers off the Tx
> > ring and places them on the Tx hardware ring.
> >
> > In the typical flow, the Rx buffer will be placed onto an Rx ring
> > (completed to the user), and the Tx buffer will be placed on the
> > completion ring to notify the user that the transfer is done.
> >
> > However, if the driver needs to tear down the hardware rings for some
> > reason (interface goes down, reconfiguration and such), the userspace
> > buffers cannot be leaked. They have to be reused or completed back to
> > userspace.
> >
> > The implementation does the following:
> >
> > * Outstanding Tx descriptors will be passed to the completion
> >   ring. The Tx code has back-pressure mechanism in place, so that
> >   enough empty space in the completion ring is guaranteed.
> >
> > * Outstanding Rx descriptors are temporarily stored on a stash/reuse
> >   queue. The reuse queue is based on Jakub's RFC. When/if the HW rings
> >   comes up again, entries from the stash are used to re-populate the
> >   ring.
> >
> > * When AF_XDP ZC is enabled, disallow changing the number of hardware
> >   descriptors via ethtool. Otherwise, the size of the stash/reuse
> >   queue can grow unbounded.
> >
> > Going forward, introducing a "zero-copy allocator" analogous to Jesper
> > Brouer's page pool would be a more robust and reuseable solution.
> >
> > Jakub: I've made a minor checkpatch-fix to your RFC, prior adding it
> > into this series.
>
> Thanks for the fix! :)
>
> Out of curiosity, did checking the reuse queue have a noticeable impact
> in your test (i.e. always using the _rq() helpers)?  You seem to be
> adding an indirect call, would that not be way worse on a retpoline
> kernel?

Do you mean the indirection in __i40e_alloc_rx_buffers_zc (patch #3)?
The indirect call is elided by the __always_inline -- without that
retpoline took 2.5Mpps worth of Rx. :-(

I'm only using the _rq helpers in the configuration/slow path, so the
fast-path is unchanged.


Bj?rn

  reply	other threads:[~2018-09-05 23:47 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-04 18:11 [PATCH bpf-next 0/4] i40e AF_XDP zero-copy buffer leak fixes Björn Töpel
2018-09-04 18:11 ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2018-09-04 18:11 ` [PATCH bpf-next 1/4] i40e: clean zero-copy XDP Tx ring on shutdown/reset Björn Töpel
2018-09-04 18:11   ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2018-09-04 18:11 ` [PATCH bpf-next 2/4] net: xsk: add a simple buffer reuse queue Björn Töpel
2018-09-04 18:11   ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2018-09-04 18:11 ` [PATCH bpf-next 3/4] i40e: clean zero-copy XDP Rx ring on shutdown/reset Björn Töpel
2018-09-04 18:11   ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2018-09-04 18:11 ` [PATCH bpf-next 4/4] i40e: disallow changing the number of descriptors when AF_XDP is on Björn Töpel
2018-09-04 18:11   ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2018-09-05 17:14 ` [PATCH bpf-next 0/4] i40e AF_XDP zero-copy buffer leak fixes Jakub Kicinski
2018-09-05 17:14   ` [Intel-wired-lan] " Jakub Kicinski
2018-09-05 19:15   ` Björn Töpel [this message]
2018-09-05 19:15     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2018-09-06  5:55     ` Alexei Starovoitov
2018-09-06  5:55       ` [Intel-wired-lan] " Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJ+HfNjQCP7c7gCVJi=LRxK2LvQrNvn6cPfRACW6TPzM8339SA@mail.gmail.com' \
    --to=bjorn.topel@gmail.com \
    --cc=ast@kernel.org \
    --cc=bjorn.topel@intel.com \
    --cc=daniel@iogearbox.net \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jakub.kicinski@netronome.com \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=magnus.karlsson@gmail.com \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.