All of lore.kernel.org
 help / color / mirror / Atom feed
From: Logan Gunthorpe <logang@deltatee.com>
To: Matthew Wilcox <willy@infradead.org>, Jason Gunthorpe <jgg@nvidia.com>
Cc: linux-kernel@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
	Joao Martins <joao.m.martins@oracle.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Ming Lei <ming.lei@redhat.com>,
	linux-block@vger.kernel.org, netdev@vger.kernel.org,
	linux-mm@kvack.org, linux-rdma@vger.kernel.org,
	dri-devel@lists.freedesktop.org, nvdimm@lists.linux.dev
Subject: Re: Phyr Starter
Date: Tue, 11 Jan 2022 15:09:13 -0700	[thread overview]
Message-ID: <ef01ce7d-f1d3-0bbb-38ba-2de4d3f7e31a@deltatee.com> (raw)
In-Reply-To: <Yd311C45gpQ3LqaW@casper.infradead.org>



On 2022-01-11 2:25 p.m., Matthew Wilcox wrote:
> That's reproducing the bad decision of the scatterlist, only with
> a different encoding.  You end up with something like:
> 
> struct neoscat {
> 	dma_addr_t dma_addr;
> 	phys_addr_t phys_addr;
> 	size_t dma_len;
> 	size_t phys_len;
> };
> 
> and the dma_addr and dma_len are unused by all-but-the-first entry when
> you have a competent IOMMU.  We want a different data structure in and
> out, and we may as well keep using the scatterlist for the dma-map-out.

With my P2PDMA patchset, even with a competent IOMMU, we need to support
multiple dma_addr/dma_len pairs (plus the flag bit). This is required to
program IOVAs and multiple bus addresses into a single DMA transactions.

I think using the scatter list for the DMA-out side is not ideal seeing
we don't need the page pointers or multiple length fields and we won't
be able to change the sgl substantially given the massive amount of
existing use cases that won't go away over night.

My hope would be along these lines:

struct phy_range {
    phys_addr_t phyr_addr;
    u32 phyr_len;
    u32 phyr_flags;
};

struct dma_range {
    dma_addr_t dmar_addr;
    u32 dmar_len;
    u32 dmar_flags;
};

A new GUP helper would somehow return a list of phy_range structs and
the new dma_map function would take that list and return a list of
dma_range structs. Each element in the list could represent a segment up
to 4GB, so any range longer than that would need multiple items in the
list. (Alternatively, perhaps the length could be a 64bit value and we
steal some of the top bits for flags or some such). The flags would not
only be needed by some of the use cases mentioned (FOLL_PIN or
DMA_BUS_ADDRESS) but could also support chaining these lists like SGLs
so continuous vmallocs would not be necessary for longer lists.

If there's an [phy|dma]_range_list struct (or some such) which contains
these range structs (per some details of Jason's suggestions) that'd be
fine by me too and would just depend on implementation details.

However, the main problem I see is a chicken and egg problem. The new
dma_map function would need to be implemented by every dma_map provider
or any driver trying to use it would need a messy fallback. Either that,
or we need a wrapper that allocates an appropriately sized SGL to pass
to any dma_map implementation that doesn't support the new structures.

Logan

WARNING: multiple messages have this Message-ID (diff)
From: Logan Gunthorpe <logang@deltatee.com>
To: Matthew Wilcox <willy@infradead.org>, Jason Gunthorpe <jgg@nvidia.com>
Cc: nvdimm@lists.linux.dev, linux-rdma@vger.kernel.org,
	John Hubbard <jhubbard@nvidia.com>,
	linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
	Ming Lei <ming.lei@redhat.com>,
	linux-block@vger.kernel.org, linux-mm@kvack.org,
	netdev@vger.kernel.org, Joao Martins <joao.m.martins@oracle.com>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: Phyr Starter
Date: Tue, 11 Jan 2022 15:09:13 -0700	[thread overview]
Message-ID: <ef01ce7d-f1d3-0bbb-38ba-2de4d3f7e31a@deltatee.com> (raw)
In-Reply-To: <Yd311C45gpQ3LqaW@casper.infradead.org>



On 2022-01-11 2:25 p.m., Matthew Wilcox wrote:
> That's reproducing the bad decision of the scatterlist, only with
> a different encoding.  You end up with something like:
> 
> struct neoscat {
> 	dma_addr_t dma_addr;
> 	phys_addr_t phys_addr;
> 	size_t dma_len;
> 	size_t phys_len;
> };
> 
> and the dma_addr and dma_len are unused by all-but-the-first entry when
> you have a competent IOMMU.  We want a different data structure in and
> out, and we may as well keep using the scatterlist for the dma-map-out.

With my P2PDMA patchset, even with a competent IOMMU, we need to support
multiple dma_addr/dma_len pairs (plus the flag bit). This is required to
program IOVAs and multiple bus addresses into a single DMA transactions.

I think using the scatter list for the DMA-out side is not ideal seeing
we don't need the page pointers or multiple length fields and we won't
be able to change the sgl substantially given the massive amount of
existing use cases that won't go away over night.

My hope would be along these lines:

struct phy_range {
    phys_addr_t phyr_addr;
    u32 phyr_len;
    u32 phyr_flags;
};

struct dma_range {
    dma_addr_t dmar_addr;
    u32 dmar_len;
    u32 dmar_flags;
};

A new GUP helper would somehow return a list of phy_range structs and
the new dma_map function would take that list and return a list of
dma_range structs. Each element in the list could represent a segment up
to 4GB, so any range longer than that would need multiple items in the
list. (Alternatively, perhaps the length could be a 64bit value and we
steal some of the top bits for flags or some such). The flags would not
only be needed by some of the use cases mentioned (FOLL_PIN or
DMA_BUS_ADDRESS) but could also support chaining these lists like SGLs
so continuous vmallocs would not be necessary for longer lists.

If there's an [phy|dma]_range_list struct (or some such) which contains
these range structs (per some details of Jason's suggestions) that'd be
fine by me too and would just depend on implementation details.

However, the main problem I see is a chicken and egg problem. The new
dma_map function would need to be implemented by every dma_map provider
or any driver trying to use it would need a messy fallback. Either that,
or we need a wrapper that allocates an appropriately sized SGL to pass
to any dma_map implementation that doesn't support the new structures.

Logan

  reply	other threads:[~2022-01-11 22:09 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-10 19:34 Phyr Starter Matthew Wilcox
2022-01-10 19:34 ` Matthew Wilcox
2022-01-11  0:41 ` Jason Gunthorpe
2022-01-11  0:41   ` Jason Gunthorpe
2022-01-11  4:32   ` Matthew Wilcox
2022-01-11  4:32     ` Matthew Wilcox
2022-01-11 15:01     ` Jason Gunthorpe
2022-01-11 15:01       ` Jason Gunthorpe
2022-01-11 18:33       ` Matthew Wilcox
2022-01-11 18:33         ` Matthew Wilcox
2022-01-11 20:21         ` Jason Gunthorpe
2022-01-11 20:21           ` Jason Gunthorpe
2022-01-11 21:25           ` Matthew Wilcox
2022-01-11 21:25             ` Matthew Wilcox
2022-01-11 22:09             ` Logan Gunthorpe [this message]
2022-01-11 22:09               ` Logan Gunthorpe
2022-01-11 22:57               ` Jason Gunthorpe
2022-01-11 22:57                 ` Jason Gunthorpe
2022-01-11 23:02                 ` Logan Gunthorpe
2022-01-11 23:02                   ` Logan Gunthorpe
2022-01-11 22:53             ` Jason Gunthorpe
2022-01-11 22:53               ` Jason Gunthorpe
2022-01-11 22:57               ` Logan Gunthorpe
2022-01-11 22:57                 ` Logan Gunthorpe
2022-01-11 23:02                 ` Jason Gunthorpe
2022-01-11 23:02                   ` Jason Gunthorpe
2022-01-11 23:08                   ` Logan Gunthorpe
2022-01-11 23:08                     ` Logan Gunthorpe
2022-01-12 18:37               ` Matthew Wilcox
2022-01-12 18:37                 ` Matthew Wilcox
2022-01-12 19:08                 ` Jason Gunthorpe
2022-01-12 19:08                   ` Jason Gunthorpe
2022-01-20 14:03                 ` Christoph Hellwig
2022-01-20 17:17                   ` Jason Gunthorpe
2022-01-20 17:17                     ` Jason Gunthorpe
2022-01-20 14:00       ` Christoph Hellwig
2022-01-11  9:05   ` Daniel Vetter
2022-01-11  9:05     ` Daniel Vetter
2022-01-11 20:26     ` Jason Gunthorpe
2022-01-11 20:26       ` Jason Gunthorpe
2022-01-20 14:09       ` Christoph Hellwig
2022-01-20 13:56   ` Christoph Hellwig
2022-01-20 15:27     ` Keith Busch
2022-01-20 15:27       ` Keith Busch
2022-01-20 15:28       ` Christoph Hellwig
2022-01-20 17:54       ` Robin Murphy
2022-01-11  8:17 ` John Hubbard
2022-01-11  8:17   ` John Hubbard
2022-01-11 14:01   ` Matthew Wilcox
2022-01-11 14:01     ` Matthew Wilcox
2022-01-11 15:02     ` Jason Gunthorpe
2022-01-11 15:02       ` Jason Gunthorpe
2022-01-11 17:31   ` Logan Gunthorpe
2022-01-11 17:31     ` Logan Gunthorpe
2022-01-20 14:12   ` Christoph Hellwig
2022-01-20 21:35     ` John Hubbard
2022-01-20 21:35       ` John Hubbard
2022-01-11 11:40 ` Thomas Zimmermann
2022-01-11 13:56   ` Matthew Wilcox
2022-01-11 13:56     ` Matthew Wilcox
2022-01-11 14:10     ` Thomas Zimmermann
2022-01-20 13:39 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ef01ce7d-f1d3-0bbb-38ba-2de4d3f7e31a@deltatee.com \
    --to=logang@deltatee.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hch@lst.de \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=joao.m.martins@oracle.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.