linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jerome Glisse <jglisse@redhat.com>
To: Jason Gunthorpe <jgg@mellanox.com>
Cc: Logan Gunthorpe <logang@deltatee.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Rafael J . Wysocki" <rafael@kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Christian Koenig <christian.koenig@amd.com>,
	Felix Kuehling <Felix.Kuehling@amd.com>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>, Christoph Hellwig <hch@lst.de>,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	Robin Murphy <robin.murphy@arm.com>,
	Joerg Roedel <jroedel@suse.de>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>
Subject: Re: [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma
Date: Tue, 29 Jan 2019 14:50:55 -0500	[thread overview]
Message-ID: <20190129195055.GH3176@redhat.com> (raw)
In-Reply-To: <20190129193250.GK10108@mellanox.com>

On Tue, Jan 29, 2019 at 07:32:57PM +0000, Jason Gunthorpe wrote:
> On Tue, Jan 29, 2019 at 02:11:23PM -0500, Jerome Glisse wrote:
> > On Tue, Jan 29, 2019 at 11:36:29AM -0700, Logan Gunthorpe wrote:
> > > 
> > > 
> > > On 2019-01-29 10:47 a.m., jglisse@redhat.com wrote:
> > > 
> > > > +	/*
> > > > +	 * Optional for device driver that want to allow peer to peer (p2p)
> > > > +	 * mapping of their vma (which can be back by some device memory) to
> > > > +	 * another device.
> > > > +	 *
> > > > +	 * Note that the exporting device driver might not have map anything
> > > > +	 * inside the vma for the CPU but might still want to allow a peer
> > > > +	 * device to access the range of memory corresponding to a range in
> > > > +	 * that vma.
> > > > +	 *
> > > > +	 * FOR PREDICTABILITY IF DRIVER SUCCESSFULY MAP A RANGE ONCE FOR A
> > > > +	 * DEVICE THEN FURTHER MAPPING OF THE SAME IF THE VMA IS STILL VALID
> > > > +	 * SHOULD ALSO BE SUCCESSFUL. Following this rule allow the importing
> > > > +	 * device to map once during setup and report any failure at that time
> > > > +	 * to the userspace. Further mapping of the same range might happen
> > > > +	 * after mmu notifier invalidation over the range. The exporting device
> > > > +	 * can use this to move things around (defrag BAR space for instance)
> > > > +	 * or do other similar task.
> > > > +	 *
> > > > +	 * IMPORTER MUST OBEY mmu_notifier NOTIFICATION AND CALL p2p_unmap()
> > > > +	 * WHEN A NOTIFIER IS CALL FOR THE RANGE ! THIS CAN HAPPEN AT ANY
> > > > +	 * POINT IN TIME WITH NO LOCK HELD.
> > > > +	 *
> > > > +	 * In below function, the device argument is the importing device,
> > > > +	 * the exporting device is the device to which the vma belongs.
> > > > +	 */
> > > > +	long (*p2p_map)(struct vm_area_struct *vma,
> > > > +			struct device *device,
> > > > +			unsigned long start,
> > > > +			unsigned long end,
> > > > +			dma_addr_t *pa,
> > > > +			bool write);
> > > > +	long (*p2p_unmap)(struct vm_area_struct *vma,
> > > > +			  struct device *device,
> > > > +			  unsigned long start,
> > > > +			  unsigned long end,
> > > > +			  dma_addr_t *pa);
> > > 
> > > I don't understand why we need new p2p_[un]map function pointers for
> > > this. In subsequent patches, they never appear to be set anywhere and
> > > are only called by the HMM code. I'd have expected it to be called by
> > > some core VMA code and set by HMM as that's what vm_operations_struct is
> > > for.
> > > 
> > > But the code as all very confusing, hard to follow and seems to be
> > > missing significant chunks. So I'm not really sure what is going on.
> > 
> > It is set by device driver when userspace do mmap(fd) where fd comes
> > from open("/dev/somedevicefile"). So it is set by device driver. HMM
> > has nothing to do with this. It must be set by device driver mmap
> > call back (mmap callback of struct file_operations). For this patch
> > you can completely ignore all the HMM patches. Maybe posting this as
> > 2 separate patchset would make it clearer.
> > 
> > For instance see [1] for how a non HMM driver can export its memory
> > by just setting those callback. Note that a proper implementation of
> > this should also include some kind of driver policy on what to allow
> > to map and what to not allow ... All this is driver specific in any
> > way.
> 
> I'm imagining that the RDMA drivers would use this interface on their
> per-process 'doorbell' BAR pages - we also wish to have P2P DMA to
> this memory. Also the entire VFIO PCI BAR mmap would be good to cover
> with this too.

Correct, you would set those callback on the mmap of your doorbell.

> 
> Jerome, I think it would be nice to have a helper scheme - I think the
> simple case would be simple remapping of PCI BAR memory, so if we
> could have, say something like:
> 
> static const struct vm_operations_struct my_ops {
>   .p2p_map = p2p_ioremap_map_op,
>   .p2p_unmap = p2p_ioremap_unmap_op,
> }
> 
> struct ioremap_data {
>   [..]
> }
> 
> fops_mmap() {
>    vma->private_data = &driver_priv->ioremap_data;
>    return p2p_ioremap_device_memory(vma, exporting_device, [..]);
> }
> 
> Which closely matches at least what the RDMA drivers do. Where
> p2p_ioremap_device_memory populates p2p_map and p2p_unmap pointers
> with sensible functions, etc.
> 
> It looks like vfio would be able to use this as well (though I am
> unsure why vfio uses remap_pfn_range instead of io_remap_pfn range for
> BAR memory..)

Yes simple helper that implement a sane default implementation is
definitly a good idea. As i was working with GPU it was not something
that immediatly poped to mind (see below). But i can certainly do
a sane set of default helper that simple device driver can use right
away without too much thinking on there part. I will add this for
next posting.

> Do any drivers need more control than this?

GPU driver do want more control :) GPU driver are moving things around
all the time and they have more memory than bar space (on newer platform
AMD GPU do resize the bar but it is not the rule for all GPUs). So
GPU driver do actualy manage their BAR address space and they map and
unmap thing there. They can not allow someone to just pin stuff there
randomly or this would disrupt their regular work flow. Hence they need
control and they might implement threshold for instance if they have
more than N pages of bar space map for peer to peer then they can decide
to fall back to main memory for any new peer mapping.

Cheers,
Jérôme

  reply	other threads:[~2019-01-29 19:51 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-29 17:47 [RFC PATCH 0/5] Device peer to peer (p2p) through vma jglisse
2019-01-29 17:47 ` [RFC PATCH 1/5] pci/p2p: add a function to test peer to peer capability jglisse
2019-01-29 18:24   ` Logan Gunthorpe
2019-01-29 19:44     ` Greg Kroah-Hartman
2019-01-29 19:53       ` Jerome Glisse
2019-01-29 20:44       ` Logan Gunthorpe
2019-01-29 21:00         ` Jerome Glisse
2019-01-29 19:56   ` Alex Deucher
2019-01-29 20:00     ` Jerome Glisse
2019-01-29 20:24     ` Logan Gunthorpe
2019-01-29 21:28       ` Alex Deucher
2019-01-30 10:25       ` Christian König
2019-01-29 17:47 ` [RFC PATCH 2/5] drivers/base: " jglisse
2019-01-29 18:26   ` Logan Gunthorpe
2019-01-29 19:54     ` Jerome Glisse
2019-01-29 19:46   ` Greg Kroah-Hartman
2019-01-29 19:56     ` Jerome Glisse
2019-01-29 17:47 ` [RFC PATCH 3/5] mm/vma: add support for peer to peer to device vma jglisse
2019-01-29 18:36   ` Logan Gunthorpe
2019-01-29 19:11     ` Jerome Glisse
2019-01-29 19:24       ` Logan Gunthorpe
2019-01-29 19:44         ` Jerome Glisse
2019-01-29 20:43           ` Logan Gunthorpe
2019-01-30  7:52             ` Christoph Hellwig
2019-01-29 19:32       ` Jason Gunthorpe
2019-01-29 19:50         ` Jerome Glisse [this message]
2019-01-29 20:24           ` Jason Gunthorpe
2019-01-29 20:44             ` Jerome Glisse
2019-01-29 23:02               ` Jason Gunthorpe
2019-01-30  0:08                 ` Jerome Glisse
2019-01-30  4:30                   ` Jason Gunthorpe
2019-01-30 15:43                     ` Jerome Glisse
2019-01-29 20:39         ` Logan Gunthorpe
2019-01-29 20:57           ` Jerome Glisse
2019-01-29 21:30             ` Logan Gunthorpe
2019-01-29 21:50               ` Jerome Glisse
2019-01-29 22:58                 ` Logan Gunthorpe
2019-01-29 23:47                   ` Jerome Glisse
2019-01-30  1:17                     ` Logan Gunthorpe
2019-01-30  2:48                       ` Jerome Glisse
2019-01-30  4:18                       ` Jason Gunthorpe
2019-01-30  8:00                         ` Christoph Hellwig
2019-01-30 15:49                           ` Jerome Glisse
2019-01-30 19:06                           ` Jason Gunthorpe
2019-01-30 19:45                             ` Logan Gunthorpe
2019-01-30 19:59                               ` Jason Gunthorpe
2019-01-30 21:01                                 ` Logan Gunthorpe
2019-01-30 21:50                                   ` Jason Gunthorpe
2019-01-30 22:52                                     ` Logan Gunthorpe
2019-01-30 23:30                                       ` Jason Gunthorpe
2019-01-31  8:13                                       ` Christoph Hellwig
2019-01-31 15:37                                         ` Jerome Glisse
2019-01-31 19:02                                         ` Jason Gunthorpe
2019-01-31 19:19                                           ` Logan Gunthorpe
2019-01-31 19:54                                             ` Jason Gunthorpe
2019-01-31 19:35                                           ` Jerome Glisse
2019-01-31 19:44                                             ` Logan Gunthorpe
2019-01-31 19:58                                             ` Jason Gunthorpe
2019-01-30 17:17                         ` Logan Gunthorpe
2019-01-30 18:56                           ` Jason Gunthorpe
2019-01-30 19:22                             ` Jerome Glisse
2019-01-30 19:38                               ` Jason Gunthorpe
2019-01-30 20:00                                 ` Logan Gunthorpe
2019-01-30 20:11                                   ` Jason Gunthorpe
2019-01-30 20:43                                     ` Jerome Glisse
2019-01-30 20:50                                       ` Jason Gunthorpe
2019-01-30 21:45                                         ` Jerome Glisse
2019-01-30 21:56                                           ` Jason Gunthorpe
2019-01-30 22:30                                             ` Jerome Glisse
2019-01-30 22:33                                               ` Jason Gunthorpe
2019-01-30 22:47                                                 ` Jerome Glisse
2019-01-30 22:51                                                   ` Jason Gunthorpe
2019-01-30 22:58                                                     ` Jerome Glisse
2019-01-30 19:52                               ` Logan Gunthorpe
2019-01-30 20:35                                 ` Jerome Glisse
2019-01-29 20:58           ` Jason Gunthorpe
2019-01-30  8:02             ` Christoph Hellwig
2019-01-30 10:33               ` Koenig, Christian
2019-01-30 15:55                 ` Jerome Glisse
2019-01-30 17:26                   ` Christoph Hellwig
2019-01-30 17:32                     ` Logan Gunthorpe
2019-01-30 17:39                     ` Jason Gunthorpe
2019-01-30 18:05                     ` Jerome Glisse
2019-01-30 17:44               ` Jason Gunthorpe
2019-01-30 18:13                 ` Logan Gunthorpe
2019-01-30 18:50                   ` Jerome Glisse
2019-01-31  8:02                     ` Christoph Hellwig
2019-01-31 15:03                       ` Jerome Glisse
2019-01-30 19:19                   ` Jason Gunthorpe
2019-01-30 19:48                     ` Logan Gunthorpe
2019-01-30 20:44                       ` Jason Gunthorpe
2019-01-31  8:05                         ` Christoph Hellwig
2019-01-31 15:11                           ` Jerome Glisse
2019-01-29 17:47 ` [RFC PATCH 4/5] mm/hmm: add support for peer to peer to HMM device memory jglisse
2019-01-29 17:47 ` [RFC PATCH 5/5] mm/hmm: add support for peer to peer to special device vma jglisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190129195055.GH3176@redhat.com \
    --to=jglisse@redhat.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=bhelgaas@google.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jgg@mellanox.com \
    --cc=jroedel@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=m.szyprowski@samsung.com \
    --cc=rafael@kernel.org \
    --cc=robin.murphy@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).