All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	nvdimm@lists.linux.dev, lsf-pc@lists.linuxfoundation.org,
	linux-rdma@vger.kernel.org, John Hubbard <jhubbard@nvidia.com>,
	dri-devel@lists.freedesktop.org, Ming Lei <ming.lei@redhat.com>,
	linux-block@vger.kernel.org, linux-mm@kvack.org,
	iommu@lists.linux.dev, netdev@vger.kernel.org,
	Joao Martins <joao.m.martins@oracle.com>,
	Jason Gunthorpe via Lsf-pc <lsf-pc@lists.linux-foundation.org>,
	Logan Gunthorpe <logang@deltatee.com>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [Lsf-pc] [LSF/MM/BPF proposal]: Physr discussion
Date: Thu, 26 Jan 2023 15:38:16 -0400	[thread overview]
Message-ID: <Y9LWqEtmkmsMrHne@ziepe.ca> (raw)
In-Reply-To: <63cef32cbafc3_3a36e529465@dwillia2-xfh.jf.intel.com.notmuch>

On Mon, Jan 23, 2023 at 12:50:52PM -0800, Dan Williams wrote:
> Matthew Wilcox wrote:
> > On Mon, Jan 23, 2023 at 11:36:51AM -0800, Dan Williams wrote:
> > > Jason Gunthorpe via Lsf-pc wrote:
> > > > I would like to have a session at LSF to talk about Matthew's
> > > > physr discussion starter:
> > > > 
> > > >  https://lore.kernel.org/linux-mm/YdyKWeU0HTv8m7wD@casper.infradead.org/
> > > > 
> > > > I have become interested in this with some immediacy because of
> > > > IOMMUFD and this other discussion with Christoph:
> > > > 
> > > >  https://lore.kernel.org/kvm/4-v2-472615b3877e+28f7-vfio_dma_buf_jgg@nvidia.com/
> > > 
> > > I think this is a worthwhile discussion. My main hangup with 'struct
> > > page' elimination in general is that if anything needs to be allocated
> > 
> > You're the first one to bring up struct page elimination.  Neither Jason
> > nor I have that as our motivation.
> 
> Oh, ok, then maybe I misread the concern in the vfio discussion. I
> thought the summary there is debating the ongoing requirement for
> 'struct page' for P2PDMA?

The VFIO problem is we need a unique pgmap at 4k granuals (or maybe
smaller, technically), tightly packed, because VFIO exposes PCI BAR
space that can be sized in such small amounts.

So, using struct page means some kind of adventure in the memory
hotplug code to allow tightly packed 4k pgmaps.

And that is assuming that every architecture that wants to support
VFIO supports pgmap and memory hot plug. I was just told that s390
doesn't, that is kind of important..

If there is a straightforward way to get a pgmap into VFIO then I'd do
that and give up this quest :)

I've never been looking at this from the angle of eliminating struct
page, but from the perspective of allowing the DMA API to correctly do
scatter/gather IO to non-struct page P2P memory because I *can't* get
a struct page for it. Ie make dma_map_resource() better. Make P2P
DMABUF work properly.

This has to come along with a different way to store address ranges
because the basic datum that needs to cross all the functional
boundaries we have is an address range list.

My general current sketch is we'd allocate some 'DMA P2P provider'
structure analogous to the MEMORY_DEVICE_PCI_P2PDMA pgmap and a single
provider would cover the entire MMIO aperture - eg the providing
device's MMIO BAR. This is enough information for the DMA API to do
its job.

We get this back either by searching an interval treey thing on the
physical address or by storing it directly in the address range list.

Jason

WARNING: multiple messages have this Message-ID (diff)
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Dan Williams <dan.j.williams@intel.com>
Cc: nvdimm@lists.linux.dev, lsf-pc@lists.linuxfoundation.org,
	linux-rdma@vger.kernel.org, John Hubbard <jhubbard@nvidia.com>,
	dri-devel@lists.freedesktop.org, Ming Lei <ming.lei@redhat.com>,
	linux-block@vger.kernel.org, linux-mm@kvack.org,
	iommu@lists.linux.dev, Matthew Wilcox <willy@infradead.org>,
	netdev@vger.kernel.org, Joao Martins <joao.m.martins@oracle.com>,
	Jason Gunthorpe via Lsf-pc <lsf-pc@lists.linux-foundation.org>,
	Logan Gunthorpe <logang@deltatee.com>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [Lsf-pc] [LSF/MM/BPF proposal]: Physr discussion
Date: Thu, 26 Jan 2023 15:38:16 -0400	[thread overview]
Message-ID: <Y9LWqEtmkmsMrHne@ziepe.ca> (raw)
In-Reply-To: <63cef32cbafc3_3a36e529465@dwillia2-xfh.jf.intel.com.notmuch>

On Mon, Jan 23, 2023 at 12:50:52PM -0800, Dan Williams wrote:
> Matthew Wilcox wrote:
> > On Mon, Jan 23, 2023 at 11:36:51AM -0800, Dan Williams wrote:
> > > Jason Gunthorpe via Lsf-pc wrote:
> > > > I would like to have a session at LSF to talk about Matthew's
> > > > physr discussion starter:
> > > > 
> > > >  https://lore.kernel.org/linux-mm/YdyKWeU0HTv8m7wD@casper.infradead.org/
> > > > 
> > > > I have become interested in this with some immediacy because of
> > > > IOMMUFD and this other discussion with Christoph:
> > > > 
> > > >  https://lore.kernel.org/kvm/4-v2-472615b3877e+28f7-vfio_dma_buf_jgg@nvidia.com/
> > > 
> > > I think this is a worthwhile discussion. My main hangup with 'struct
> > > page' elimination in general is that if anything needs to be allocated
> > 
> > You're the first one to bring up struct page elimination.  Neither Jason
> > nor I have that as our motivation.
> 
> Oh, ok, then maybe I misread the concern in the vfio discussion. I
> thought the summary there is debating the ongoing requirement for
> 'struct page' for P2PDMA?

The VFIO problem is we need a unique pgmap at 4k granuals (or maybe
smaller, technically), tightly packed, because VFIO exposes PCI BAR
space that can be sized in such small amounts.

So, using struct page means some kind of adventure in the memory
hotplug code to allow tightly packed 4k pgmaps.

And that is assuming that every architecture that wants to support
VFIO supports pgmap and memory hot plug. I was just told that s390
doesn't, that is kind of important..

If there is a straightforward way to get a pgmap into VFIO then I'd do
that and give up this quest :)

I've never been looking at this from the angle of eliminating struct
page, but from the perspective of allowing the DMA API to correctly do
scatter/gather IO to non-struct page P2P memory because I *can't* get
a struct page for it. Ie make dma_map_resource() better. Make P2P
DMABUF work properly.

This has to come along with a different way to store address ranges
because the basic datum that needs to cross all the functional
boundaries we have is an address range list.

My general current sketch is we'd allocate some 'DMA P2P provider'
structure analogous to the MEMORY_DEVICE_PCI_P2PDMA pgmap and a single
provider would cover the entire MMIO aperture - eg the providing
device's MMIO BAR. This is enough information for the DMA API to do
its job.

We get this back either by searching an interval treey thing on the
physical address or by storing it directly in the address range list.

Jason

  parent reply	other threads:[~2023-01-26 19:38 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-21 15:03 [LSF/MM/BPF proposal]: Physr discussion Jason Gunthorpe
2023-01-21 15:03 ` Jason Gunthorpe
2023-01-23  4:36 ` Matthew Wilcox
2023-01-23  4:36   ` Matthew Wilcox
2023-01-23 13:44   ` Jason Gunthorpe
2023-01-23 13:44     ` Jason Gunthorpe
2023-01-23 19:47     ` Bart Van Assche
2023-01-23 19:47       ` Bart Van Assche
2023-01-24  6:15       ` Chaitanya Kulkarni
2023-01-24  6:15         ` Chaitanya Kulkarni
2023-01-26  9:39   ` Mike Rapoport
2023-01-26  9:39     ` Mike Rapoport
2023-01-23 19:36 ` [Lsf-pc] " Dan Williams
2023-01-23 19:36   ` Dan Williams
2023-01-23 20:11   ` Matthew Wilcox
2023-01-23 20:11     ` Matthew Wilcox
2023-01-23 20:50     ` Dan Williams
2023-01-23 20:50       ` Dan Williams
2023-01-23 22:46       ` Matthew Wilcox
2023-01-26 19:38       ` Jason Gunthorpe [this message]
2023-01-26 19:38         ` Jason Gunthorpe
2023-01-26  1:45 ` Zhu Yanjun
2023-01-26  1:45   ` Zhu Yanjun
2023-02-28 20:59 ` T.J. Mercier
2023-02-28 20:59   ` T.J. Mercier
2023-04-17 19:59   ` Jason Gunthorpe
2023-04-17 19:59     ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y9LWqEtmkmsMrHne@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=dan.j.williams@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux.dev \
    --cc=jhubbard@nvidia.com \
    --cc=joao.m.martins@oracle.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=lsf-pc@lists.linuxfoundation.org \
    --cc=ming.lei@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.