All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Christoph Lameter <cl@linux.com>
Cc: "John Fastabend" <john.fastabend@gmail.com>,
	"Mike Rapoport" <rppt@linux.vnet.ibm.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	"Willem de Bruijn" <willemdebruijn.kernel@gmail.com>,
	"Björn Töpel" <bjorn.topel@intel.com>,
	"Karlsson, Magnus" <magnus.karlsson@intel.com>,
	"Alexander Duyck" <alexander.duyck@gmail.com>,
	"Mel Gorman" <mgorman@techsingularity.net>,
	"Tom Herbert" <tom@herbertland.com>,
	"Brenden Blanco" <bblanco@plumgrid.com>,
	"Tariq Toukan" <tariqt@mellanox.com>,
	"Saeed Mahameed" <saeedm@mellanox.com>,
	"Jesse Brandeburg" <jesse.brandeburg@intel.com>,
	"Kalman Meth" <METH@il.ibm.com>,
	"Vladislav Yasevich" <vyasevich@gmail.com>,
	brouer@redhat.com
Subject: Re: Designing a safe RX-zero-copy Memory Model for Networking
Date: Tue, 13 Dec 2016 17:10:28 +0100	[thread overview]
Message-ID: <20161213171028.24dbf519@redhat.com> (raw)
In-Reply-To: <alpine.DEB.2.20.1612121200280.13607@east.gentwo.org>


On Mon, 12 Dec 2016 12:06:59 -0600 (CST) Christoph Lameter <cl@linux.com> wrote:
> On Mon, 12 Dec 2016, Jesper Dangaard Brouer wrote:
> 
> > Hmmm. If you can rely on hardware setup to give you steering and
> > dedicated access to the RX rings.  In those cases, I guess, the "push"
> > model could be a more direct API approach.  
> 
> If the hardware does not support steering then one should be able to
> provide those services in software.

This is the early demux problem.  With the push-mode of registering
memory, you need hardware steering support, for zero-copy support, as
the software step happens after DMA engine have written into the memory.

My model pre-VMA map all the pages in the RX ring (if zero-copy gets
enabled, by a single user).  The software step can filter and zero-copy
send packet-pages to the application/socket that requested this. The
disadvantage is all zero-copy application need to share this VMA
mapping.  This is solved by configuring HW filters into a RX-queue, and
then only attach your zero-copy application to that queue.


> > I was shooting for a model that worked without hardware support.
> > And then transparently benefit from HW support by configuring a HW
> > filter into a specific RX queue and attaching/using to that queue.  
> 
> The discussion here is a bit amusing since these issues have been
> resolved a long time ago with the design of the RDMA subsystem. Zero
> copy is already in wide use. Memory registration is used to pin down
> memory areas. Work requests can be filed with the RDMA subsystem that
> then send and receive packets from the registered memory regions.
> This is not strictly remote memory access but this is a basic mode of
> operations supported  by the RDMA subsystem. The mlx5 driver quoted
> here supports all of that.

I hear what you are saying.  I will look into a push-model, as it might
be a better solution.
 I will read up on RDMA + verbs and learn more about their API model.  I
even plan to write a small sample program to get a feeling for the API,
and maybe we can use that as a baseline for the performance target we
can obtain on the same HW. (Thanks to Björn for already giving me some
pointer here)


> What is bad about RDMA is that it is a separate kernel subsystem.
> What I would like to see is a deeper integration with the network
> stack so that memory regions can be registred with a network socket
> and work requests then can be submitted and processed that directly
> read and write in these regions. The network stack should provide the
> services that the hardware of the NIC does not suppport as usual.

Interesting.  So you even imagine sockets registering memory regions
with the NIC.  If we had a proper NIC HW filter API across the drivers,
to register the steering rule (like ibv_create_flow), this would be
doable, but we don't (DPDK actually have an interesting proposal[1])

 
> The RX/TX ring in user space should be an additional mode of
> operation of the socket layer. Once that is in place the "Remote
> memory acces" can be trivially implemented on top of that and the
> ugly RDMA sidecar subsystem can go away.
 
I cannot follow that 100%, but I guess you are saying we also need a
more efficient mode of handing over pages/packet to userspace (than
going through the normal socket API calls).


Appreciate your input, it challenged my thinking.
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

[1] https://rawgit.com/6WIND/rte_flow/master/rte_flow.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-12-13 16:10 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-05 14:31 Designing a safe RX-zero-copy Memory Model for Networking Jesper Dangaard Brouer
2016-12-12  8:38 ` Mike Rapoport
2016-12-12  9:40   ` Jesper Dangaard Brouer
2016-12-12 14:14     ` Mike Rapoport
2016-12-12 14:14       ` Mike Rapoport
2016-12-12 14:49       ` John Fastabend
2016-12-12 17:13         ` Jesper Dangaard Brouer
2016-12-12 18:06           ` Christoph Lameter
2016-12-12 18:06             ` Christoph Lameter
2016-12-13 16:10             ` Jesper Dangaard Brouer [this message]
2016-12-13 16:36               ` Christoph Lameter
2016-12-13 16:36                 ` Christoph Lameter
2016-12-13 17:43               ` John Fastabend
2016-12-13 17:43                 ` John Fastabend
2016-12-13 19:53                 ` David Miller
2016-12-13 20:08                   ` John Fastabend
2016-12-14  9:39                     ` Jesper Dangaard Brouer
2016-12-14 16:32                       ` John Fastabend
2016-12-14 16:45                         ` Alexander Duyck
2016-12-14 21:29                           ` Jesper Dangaard Brouer
2016-12-14 22:45                             ` Alexander Duyck
2016-12-15  8:28                               ` Jesper Dangaard Brouer
2016-12-15 15:59                                 ` Alexander Duyck
2016-12-15 16:38                                 ` Christoph Lameter
2016-12-14 21:04                         ` Jesper Dangaard Brouer
2016-12-13 18:39               ` Hannes Frederic Sowa
2016-12-14 17:00                 ` Christoph Lameter
2016-12-14 17:00                   ` Christoph Lameter
2016-12-14 17:37                   ` David Laight
2016-12-14 19:43                     ` Christoph Lameter
2016-12-14 19:43                       ` Christoph Lameter
2016-12-14 20:37                       ` Hannes Frederic Sowa
2016-12-14 20:37                         ` Hannes Frederic Sowa
2016-12-14 21:22                         ` Christoph Lameter
2016-12-13  9:42         ` Mike Rapoport
2016-12-12 15:10       ` Jesper Dangaard Brouer
2016-12-12 15:10         ` Jesper Dangaard Brouer
2016-12-13  8:43         ` Mike Rapoport
2016-12-13  8:43           ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161213171028.24dbf519@redhat.com \
    --to=brouer@redhat.com \
    --cc=METH@il.ibm.com \
    --cc=alexander.duyck@gmail.com \
    --cc=bblanco@plumgrid.com \
    --cc=bjorn.topel@intel.com \
    --cc=cl@linux.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=john.fastabend@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=magnus.karlsson@intel.com \
    --cc=mgorman@techsingularity.net \
    --cc=netdev@vger.kernel.org \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=saeedm@mellanox.com \
    --cc=tariqt@mellanox.com \
    --cc=tom@herbertland.com \
    --cc=vyasevich@gmail.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.