linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Logan Gunthorpe <logang@deltatee.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-pci@vger.kernel.org,
	linux-rdma <linux-rdma@vger.kernel.org>,
	Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Sagi Grimberg <sagi@grimberg.me>, Keith Busch <kbusch@kernel.org>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Stephen Bates <sbates@raithlin.com>
Subject: Re: [RFC PATCH 00/28] Removing struct page from P2PDMA
Date: Thu, 20 Jun 2019 13:34:35 -0600	[thread overview]
Message-ID: <91eba9a0-27b4-08b4-7c12-86e24e1bfe85@deltatee.com> (raw)
In-Reply-To: <CAPcyv4ijztOK1FUjLuFing7ps4LOHt=6z=eO=98HHWauHA+yog@mail.gmail.com>



On 2019-06-20 12:45 p.m., Dan Williams wrote:
> On Thu, Jun 20, 2019 at 9:13 AM Logan Gunthorpe <logang@deltatee.com> wrote:
>>
>> For eons there has been a debate over whether or not to use
>> struct pages for peer-to-peer DMA transactions. Pro-pagers have
>> argued that struct pages are necessary for interacting with
>> existing code like scatterlists or the bio_vecs. Anti-pagers
>> assert that the tracking of the memory is unecessary and
>> allocating the pages is a waste of memory. Both viewpoints are
>> valid, however developers working on GPUs and RDMA tend to be
>> able to do away with struct pages relatively easily
> 
> Presumably because they have historically never tried to be
> inter-operable with the block layer or drivers outside graphics and
> RDMA.

Yes, but really there are three main sets of users for P2P right now:
graphics, RDMA and NVMe. And every time a patch set comes from GPU/RDMA
people they don't bother with struct page. I seem to be the only one
trying to push P2P with NVMe and it seems to be a losing battle.

> Please spell out the value, it is not immediately obvious to me
> outside of some memory capacity savings.

There are a few things:

* Have consistency with P2P efforts as most other efforts have been
avoiding struct page. Nobody else seems to want
pci_p2pdma_add_resource() or any devm_memremap_pages() call.

* Avoid all arch-specific dependencies for P2P. With struct page the IO
memory must fit in the linear mapping. This requires some work with
RISC-V and I remember some complaints from the powerpc people regarding
this. Certainly not all arches will be able to fit the IO region into
the linear mapping space.

* Remove a bunch of PCI P2PDMA special case mapping stuff from the block
layer and RDMA interface (which I've been hearing complaints over).

* Save the struct page memory that is largely unused (as you note).

>> Previously, there have been multiple attempts[1][2] to replace
>> struct page usage with pfn_t but this has been unpopular seeing
>> it creates dangerous edge cases where unsuspecting code might
>> run accross pfn_t's they are not ready for.
> 
> That's not the conclusion I arrived at because pfn_t is specifically
> an opaque type precisely to force "unsuspecting" code to throw
> compiler assertions. Instead pfn_t was dealt its death blow here:
> 
> https://lore.kernel.org/lkml/CA+55aFzON9617c2_Amep0ngLq91kfrPiSccdZakxir82iekUiA@mail.gmail.com/

Ok, well yes the special pages are what we've done for P2PDMA today. But
I don't think Linus's criticism really applies to what's in this RFC.
For starters, P2PDMA doesn't, and has have never, used struct page to
look up the reference count. PCI BARs have no relation to the cache so
there's no need to serialize their access but this can be done
before/after the DMA addresses are submitted to the block/rdma layer if
it was required.

In fact, the only thing the struct page is used for in the current
P2PDMA implementation is a single flag indicating it's special and needs
to be mapped in a special way.
> My primary concern with this is that ascribes a level of generality
> that just isn't there for peer-to-peer dma operations. "Peer"
> addresses are not "DMA" addresses, and the rules about what can and
> can't do peer-DMA are not generically known to the block layer.

Correct, but I don't think we should teach the block layer about these
rules. In the current code, the rules are enforced outside the block
layer before the bios are submitted and this patch set doesn't change
that. The driver orchestrating P2P will always have to check the rules
and derive addresses from them (as appropriate). With the RFC the block
layer then doesn't have to care and can just handle the DMA addresses
directly.

> At least with a side object there's a chance to describe / recall those
> restrictions as these things get passed around the I/O stack, but an
> undecorated "DMA" address passed through the block layer with no other
> benefit to any subsystem besides RDMA does not feel like it advances
> the state of the art.
> 
> Again, what are the benefits of plumbing this RDMA special case?

Because I don't think it is an RDMA special case.

Logan

  parent reply	other threads:[~2019-06-20 19:34 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-20 16:12 [RFC PATCH 00/28] Removing struct page from P2PDMA Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 01/28] block: Introduce DMA direct request type Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 02/28] block: Add dma_vec structure Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 03/28] block: Warn on mis-use of dma-direct bios Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 04/28] block: Never bounce " Logan Gunthorpe
2019-06-20 17:23   ` Jason Gunthorpe
2019-06-20 18:38     ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 05/28] block: Skip dma-direct bios in bio_integrity_prep() Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 06/28] block: Support dma-direct bios in bio_advance_iter() Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 07/28] block: Use dma_vec length in bio_cur_bytes() for dma-direct bios Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 08/28] block: Introduce dmavec_phys_mergeable() Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 09/28] block: Introduce vec_gap_to_prev() Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 10/28] block: Create generic vec_split_segs() from bvec_split_segs() Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 11/28] block: Create blk_segment_split_ctx Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 12/28] block: Create helper for bvec_should_split() Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 13/28] block: Generalize bvec_should_split() Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 14/28] block: Support splitting dma-direct bios Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 15/28] block: Support counting dma-direct bio segments Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 16/28] block: Implement mapping dma-direct requests to SGs in blk_rq_map_sg() Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 17/28] block: Introduce queue flag to indicate support for dma-direct bios Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 18/28] block: Introduce bio_add_dma_addr() Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 19/28] nvme-pci: Support dma-direct bios Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 20/28] IB/core: Introduce API for initializing a RW ctx from a DMA address Logan Gunthorpe
2019-06-20 16:49   ` Jason Gunthorpe
2019-06-20 16:59     ` Logan Gunthorpe
2019-06-20 17:11       ` Jason Gunthorpe
2019-06-20 18:24         ` Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 21/28] nvmet: Split nvmet_bdev_execute_rw() into a helper function Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 22/28] nvmet: Use DMA addresses instead of struct pages for P2P Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 23/28] nvme-pci: Remove support for PCI_P2PDMA requests Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 24/28] block: Remove PCI_P2PDMA queue flag Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 25/28] IB/core: Remove P2PDMA mapping support in rdma_rw_ctx Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 26/28] PCI/P2PDMA: Remove SGL helpers Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 27/28] PCI/P2PDMA: Remove struct pages that back P2PDMA memory Logan Gunthorpe
2019-06-20 16:12 ` [RFC PATCH 28/28] memremap: Remove PCI P2PDMA page memory type Logan Gunthorpe
2019-06-20 18:45 ` [RFC PATCH 00/28] Removing struct page from P2PDMA Dan Williams
2019-06-20 19:33   ` Jason Gunthorpe
2019-06-20 20:18     ` Dan Williams
2019-06-20 20:51       ` Logan Gunthorpe
2019-06-21 17:47       ` Jason Gunthorpe
2019-06-21 17:54         ` Dan Williams
2019-06-24  7:31     ` Christoph Hellwig
2019-06-24 13:46       ` Jason Gunthorpe
2019-06-24 13:50         ` Christoph Hellwig
2019-06-24 13:55           ` Jason Gunthorpe
2019-06-24 16:53             ` Logan Gunthorpe
2019-06-24 18:16               ` Jason Gunthorpe
2019-06-24 18:28                 ` Logan Gunthorpe
2019-06-24 18:54                   ` Jason Gunthorpe
2019-06-24 19:37                     ` Logan Gunthorpe
2019-06-24 16:10         ` Logan Gunthorpe
2019-06-25  7:18           ` Christoph Hellwig
2019-06-20 19:34   ` Logan Gunthorpe [this message]
2019-06-20 23:40     ` Dan Williams
2019-06-20 23:42       ` Logan Gunthorpe
2019-06-24  7:27 ` Christoph Hellwig
2019-06-24 16:07   ` Logan Gunthorpe
2019-06-25  7:20     ` Christoph Hellwig
2019-06-25 15:57       ` Logan Gunthorpe
2019-06-25 17:01         ` Christoph Hellwig
2019-06-25 19:54           ` Logan Gunthorpe
2019-06-26  6:57             ` Christoph Hellwig
2019-06-26 18:31               ` Logan Gunthorpe
2019-06-26 20:21                 ` Jason Gunthorpe
2019-06-26 20:39                   ` Dan Williams
2019-06-26 20:54                     ` Jason Gunthorpe
2019-06-26 20:55                     ` Logan Gunthorpe
2019-06-26 20:45                   ` Logan Gunthorpe
2019-06-26 21:00                     ` Jason Gunthorpe
2019-06-26 21:18                       ` Logan Gunthorpe
2019-06-27  6:32                         ` Jason Gunthorpe
2019-06-27 16:09                           ` Logan Gunthorpe
2019-06-27 16:35                             ` Jason Gunthorpe
2019-06-27 16:49                               ` Logan Gunthorpe
2019-06-28  4:57                                 ` Jason Gunthorpe
2019-06-28 16:22                                   ` Logan Gunthorpe
2019-06-28 17:29                                     ` Jason Gunthorpe
2019-06-28 18:29                                       ` Logan Gunthorpe
2019-06-28 19:09                                         ` Jason Gunthorpe
2019-06-28 19:35                                           ` Logan Gunthorpe
2019-07-02 22:45                                             ` Jason Gunthorpe
2019-07-02 22:52                                               ` Logan Gunthorpe
2019-06-27  9:08                     ` Christoph Hellwig
2019-06-27 16:30                       ` Logan Gunthorpe
2019-06-27 17:00                         ` Christoph Hellwig
2019-06-27 18:00                           ` Logan Gunthorpe
2019-06-28 13:38                             ` Christoph Hellwig
2019-06-28 15:54                               ` Logan Gunthorpe
2019-06-27  9:01                 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=91eba9a0-27b4-08b4-7c12-86e24e1bfe85@deltatee.com \
    --to=logang@deltatee.com \
    --cc=axboe@kernel.dk \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=hch@lst.de \
    --cc=jgg@ziepe.ca \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=sagi@grimberg.me \
    --cc=sbates@raithlin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).