All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Jonathan Lemon <jonathan.lemon@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>, <netdev@vger.kernel.org>,
	<kernel-team@fb.com>, <robin.murphy@arm.com>,
	<akpm@linux-foundation.org>, <davem@davemloft.net>,
	<kuba@kernel.org>, <willemb@google.com>, <edumazet@google.com>,
	<steffen.klassert@secunet.com>, <saeedm@mellanox.com>,
	<maximmi@mellanox.com>, <bjorn.topel@intel.com>,
	<magnus.karlsson@intel.com>, <borisp@mellanox.com>,
	<david@redhat.com>
Subject: Re: [RFC PATCH v2 21/21] netgpu/nvidia: add Nvidia plugin for netgpu
Date: Tue, 28 Jul 2020 20:38:06 -0300	[thread overview]
Message-ID: <20200728233806.GC16789@nvidia.com> (raw)
In-Reply-To: <20200728210116.56potw45eyptmlc7@bsd-mbp.dhcp.thefacebook.com>

On Tue, Jul 28, 2020 at 02:01:16PM -0700, Jonathan Lemon wrote:
> On Tue, Jul 28, 2020 at 03:19:04PM -0300, Jason Gunthorpe wrote:
> > On Mon, Jul 27, 2020 at 06:48:12PM -0700, Jonathan Lemon wrote:
> > 
> > > While the current GPU utilized is nvidia, there's nothing in the rest of
> > > the patches specific to Nvidia - an Intel or AMD GPU interface could be
> > > equally workable.
> > 
> > I think that is very misleading.
> > 
> > It looks like this patch, and all the ugly MM stuff, is done the way
> > it is *specifically* to match the clunky nv_p2p interface that only
> > the NVIDIA driver exposes.
> 
> For /this/ patch [21], this is quite true.  I'm forced to use the nv_p2p
> API if I want to use the hardware that I have.  What's being overlooked
> is that the host mem driver does not do this, nor would another GPU
> if it used p2p_dma.  I'm just providing get_page, put_page, get_dma.

Not really, the design copied the nv_p2p api design directly into
struct netgpu_functions and then aligned the rest of the parts to use
it too. Yes, other GPU drivers could also be squeezed into this API,
but if you'd never looked at the NVIDIA driver you'd never pick such a
design. It is inherently disconnected from the MM.

> > Any approach done in tree, where we can actually modify the GPU
> > driver, would do sane things like have the GPU driver itself create
> > the MEMORY_DEVICE_PCI_P2PDMA pages, use the P2P DMA API framework, use
> > dmabuf for the cross-driver attachment, etc, etc.
> 
> So why doesn't Nvidia implement the above in the driver?
> Actually a serious question, not trolling here.

A kernel mailing list is not appropriate place to discuss a
proprietary out of tree driver, take questions like that with your
support channel.

> > If you are serious about advancing this then the initial patches in a
> > long road must be focused on building up the core kernel
> > infrastructure for P2P DMA to a point where netdev could consume
> > it. There has been a lot of different ideas thrown about on how to do
> > this over the years.
> 
> Yes, I'm serious about doing this work, and may not have seen or
> remember all the various ideas I've seen over time.  The netstack
> operates on pages - are you advocating replacing them with sglists?

So far, the general expectation is that any pages would be ZONE_DEVICE
MEMORY_DEVICE_PCI_P2PDMA pages created by the PCI device's driver to
cover the device's BAR. These are __iomem pages so they can't be
intermixed in the kernel with system memory pages. That detail has
a been a large stumbling block in most cases.

Resolving this design issue removes most of the MM hackery in the
netgpu. Though, I have no idea if you can intermix ZONE_DEVICE pages
into skb's in the net stack.

From there, I'd expect the pages are mmaped into userspace in a VMA or
passed into a userspace dmabuf FD.

At this point, consumers, like the net stack should rely on some core
APIs to extract the pages and DMA maps from the user objects. Either
some new pin_user_pages() variant for VMAs or via the work that was
started on the dma_buf_map_attachment() for dmabuf.

From there it needs to handle everything carefully and then call over
to the pcip2p code to validate and dma map them. There are many
missing little details along this path.

Overall there should be nothing like 'netgpu'. This is all just some
special case of an existing user memory flow where the user pages
being touched are allowed to be P2P pages or system memory and the
flow knows how to deal with the difference.

More or less. All of these touch points need beefing up and additional
features.

> > > I think this is a better patch than all the various implementations of
> > > the protocol stack in the form of RDMA, driver code and device firmware.
> > 
> > Oh? You mean "better" in the sense the header split offload in the NIC
> > is better liked than a full protocol running in the NIC?
> 
> Yes.  The NIC firmware should become simpler, not more complicated.

Do you have any application benchmarks? The typical AI communication
pattern is very challenging and a state of the art RDMA implementation
gets incredible performance.

Jason

  parent reply	other threads:[~2020-07-28 23:38 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20200727052846.4070247-1-jonathan.lemon@gmail.com>
     [not found] ` <20200727052846.4070247-4-jonathan.lemon@gmail.com>
2020-07-27  5:58   ` [RFC PATCH v2 03/21] mm: Allow DMA mapping of pages which are not online Christoph Hellwig
2020-07-27 17:08     ` Jonathan Lemon
     [not found] ` <20200727052846.4070247-22-jonathan.lemon@gmail.com>
2020-07-27  7:35   ` [RFC PATCH v2 21/21] netgpu/nvidia: add Nvidia plugin for netgpu Christoph Hellwig
2020-07-27 17:00     ` Jonathan Lemon
2020-07-27 18:24       ` Christoph Hellwig
2020-07-28  1:48         ` Jonathan Lemon
2020-07-28  6:47           ` Christoph Hellwig
2020-07-28 16:05             ` Jonathan Lemon
2020-07-28 16:10               ` Christoph Hellwig
2020-07-28 18:19           ` Jason Gunthorpe
2020-07-28 21:01             ` Jonathan Lemon
2020-07-28 21:14               ` Christoph Hellwig
2020-07-28 23:38               ` Jason Gunthorpe [this message]
     [not found] ` <20200727052846.4070247-14-jonathan.lemon@gmail.com>
2020-07-27 15:19   ` [RFC PATCH v2 13/21] net/tcp: Pad TCP options out to a fixed size " Eric Dumazet
2020-07-27 17:20     ` Jonathan Lemon
     [not found] ` <20200727052846.4070247-16-jonathan.lemon@gmail.com>
2020-07-27 15:19   ` [RFC PATCH v2 15/21] net/tcp: add MSG_NETDMA flag for sendmsg() Eric Dumazet
2020-07-27 15:55     ` Jonathan Lemon
2020-07-27 16:09       ` Eric Dumazet
2020-07-27 17:35         ` Jonathan Lemon
2020-07-27 17:44           ` Eric Dumazet
2020-07-28  2:11             ` Jonathan Lemon
2020-07-28  2:17               ` Eric Dumazet
2020-07-28  3:08                 ` Jonathan Lemon
2020-07-28  6:50                 ` Christoph Hellwig
     [not found] ` <20200727052846.4070247-9-jonathan.lemon@gmail.com>
2020-07-27 15:24   ` [RFC PATCH v2 08/21] skbuff: add a zc_netgpu bitflag Eric Dumazet
2020-07-27 16:59     ` Jonathan Lemon
2020-07-27 17:08       ` Eric Dumazet
2020-07-27 17:16         ` Jonathan Lemon
2020-07-27 22:44 [RFC PATCH v2 00/21] netgpu: networking between NIC and GPU/CPU Jonathan Lemon
2020-07-27 22:44 ` [RFC PATCH v2 21/21] netgpu/nvidia: add Nvidia plugin for netgpu Jonathan Lemon
2020-07-28 16:31   ` Greg KH
2020-07-28 17:18     ` Chris Mason
2020-07-28 17:27       ` Christoph Hellwig
2020-07-28 18:47         ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200728233806.GC16789@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=bjorn.topel@intel.com \
    --cc=borisp@mellanox.com \
    --cc=davem@davemloft.net \
    --cc=david@redhat.com \
    --cc=edumazet@google.com \
    --cc=hch@lst.de \
    --cc=jonathan.lemon@gmail.com \
    --cc=kernel-team@fb.com \
    --cc=kuba@kernel.org \
    --cc=magnus.karlsson@intel.com \
    --cc=maximmi@mellanox.com \
    --cc=netdev@vger.kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=saeedm@mellanox.com \
    --cc=steffen.klassert@secunet.com \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.