linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [LSF/MM/BPF TOPIC] [LSF/MM/BPF ATTEND] : Two stage IOMMU DMA mapping operations
@ 2024-02-27  8:17 Chaitanya Kulkarni
  2024-02-27 11:30 ` Leon Romanovsky
  2024-03-05 13:03 ` Jason Gunthorpe
  0 siblings, 2 replies; 5+ messages in thread
From: Chaitanya Kulkarni @ 2024-02-27  8:17 UTC (permalink / raw)
  To: lsf-pc, linux-block, linux-nvme, iommu, linux-rdma, linux-mm
  Cc: Jens Axboe, Bart Van Assche, kbusch, Damien Le Moal,
	Amir Goldstein, josef, Martin K. Petersen, daniel,
	Christoph Hellwig, Dan Williams, jack, Leon Romanovsky,
	Jason Gunthorpe

Hi,

* Problem Statement :-
-------------------------------------------------------------------------
The existing IOMMU DMA mapping operation is performed in two steps at the
same time (one-shot):
1. Allocates IOVA space.
2. Actually maps DMA pages to that space.
For example, map scatter-gather list:
dma_map_sg_attrs()
   __dma_map_sg_attrs
     ops->map_sg()
       iommu_dma_map_sg()
         Calculate length of IOVA space that  is needed

         /* ####### step one allocate IOVA space ####### */
         iommu_dma_alloc_iova()

         /* ####### step two actually map DMA Pages ####### */
         iommu_map_sg()
           for each entry in sg list()
             __iommu_map()
               iommu_domain_ops->map_pages()

This one-shot operation works perfectly for non-complex scenarios where
callers use the existing DMA API in the control path when they setup
hardware.

However, in more complex scenarios, when DMA mapping is needed in the
data path and especially when some sort of specific intermediary
datatype is involved (sg list), this one-shot approach:

1. Forces developers to introduce new DMA APIs for specific datatype,
    e.g., Existing scatter-gather mapping functions in dma mapping
    existing subsystems :-

    dma_map_sgtable()
      __dma_map_sg_attrs()
    dma_unmap_sg_attrs()
    blk_rq_map_sg()
      __blk_rq_map_sg()
      __blk_bvec_map_sg()
      __blk_bios_map_sg()
    blk_bvec_map_sg()

    OR

    Latest Chuck's RFC series [1] aims to incorporate biovec-related
    DMA mapping (which expands bio_vec with DMA addresses). Probably,
    struct folio will also require it.

2. Creates dependencies on a data type, forcing certain intermediary
    data type allocation/de-allocation and page-to-data-type mapping
    and unmapping in the fast path (submission or completion).

* Proposed approach and discussion points :-
-------------------------------------------------------------------------

Instead of teaching DMA APIs to know about specific datatypes & creating
a dependency on it, that may add performance overhead with mapping and
allocation, we propose to separate the existing DMA mapping routine into
two steps where:

Step 1 : Provide an option to API users (subsystems) to perform all
          calculations internally in-advance.
Step 2 : Map pages when they are needed.

These advanced DMA mapping APIs are needed to calculate the IOVA size to
allocate as one chunk and a combination of offset calculations to know
which part of IOVA to be mapped to which page.

The new API will also allow us to remove the dependency on the sg list as
discussed previously in [2].

The main advantages of this approach as it is seen in upcoming RFC
series are:

1. Simplified & increased performance in page fault handling for
    On-Demand-Paging (ODP) mode for RDMA.
2. Reduced memory footprint for VFIO PCI live migration code.
3. Reduced overhead of intermediary SG table manipulation in the fast
    path for storage drivers where block layer requests are mapped onto
    sg table and then sg table is mapped onto DMA :-
    xxx_queue_rq()
     allocate sg table
     blk_rq_map_sg()
       merge and maps bvecs to sg
     dma_map_sgtable()
      maps pages in sg to DMA.

In order to create a good platform for a concrete and meaningful
discussion at LSFMM 24, we plan to post an RFC within the next two weeks.

Required Attendees list :-

Christoph Hellwig
Jason Gunthorpe
Jens Axboe
Chuck Lever
David Howells
Keith Busch
Bart Van Assche
Damien Le Moal
Martin Petersen

-ck

[1] 
https://lore.kernel.org/all/169772852492.5232.17148564580779995849.stgit@klimt.1015granger.net
[2] https://lore.kernel.org/linux-iommu/20200708065014.GA5694@lst.de/




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM/BPF TOPIC] [LSF/MM/BPF ATTEND] : Two stage IOMMU DMA mapping operations
  2024-02-27  8:17 [LSF/MM/BPF TOPIC] [LSF/MM/BPF ATTEND] : Two stage IOMMU DMA mapping operations Chaitanya Kulkarni
@ 2024-02-27 11:30 ` Leon Romanovsky
  2024-03-03 16:43   ` Zhu Yanjun
  2024-03-05 13:03 ` Jason Gunthorpe
  1 sibling, 1 reply; 5+ messages in thread
From: Leon Romanovsky @ 2024-02-27 11:30 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: lsf-pc, linux-block, linux-nvme, iommu, linux-rdma, linux-mm,
	Jens Axboe, Bart Van Assche, kbusch, Damien Le Moal,
	Amir Goldstein, josef, Martin K. Petersen, daniel,
	Christoph Hellwig, Dan Williams, jack, Jason Gunthorpe,
	Chuck Lever

On Tue, Feb 27, 2024 at 08:17:27AM +0000, Chaitanya Kulkarni wrote:
> Hi,

<...>

> In order to create a good platform for a concrete and meaningful
> discussion at LSFMM 24, we plan to post an RFC within the next two weeks.

The code can be found here https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=dma-split

Thanks

> 
> Required Attendees list :-
> 
> Christoph Hellwig
> Jason Gunthorpe
> Jens Axboe
> Chuck Lever
> David Howells
> Keith Busch
> Bart Van Assche
> Damien Le Moal
> Martin Petersen
> 
> -ck
> 
> [1] 
> https://lore.kernel.org/all/169772852492.5232.17148564580779995849.stgit@klimt.1015granger.net
> [2] https://lore.kernel.org/linux-iommu/20200708065014.GA5694@lst.de/
> 
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM/BPF TOPIC] [LSF/MM/BPF ATTEND] : Two stage IOMMU DMA mapping operations
  2024-02-27 11:30 ` Leon Romanovsky
@ 2024-03-03 16:43   ` Zhu Yanjun
  2024-03-04  2:27     ` Zhu Yanjun
  0 siblings, 1 reply; 5+ messages in thread
From: Zhu Yanjun @ 2024-03-03 16:43 UTC (permalink / raw)
  To: Leon Romanovsky, Chaitanya Kulkarni
  Cc: lsf-pc, linux-block, linux-nvme, iommu, linux-rdma, linux-mm,
	Jens Axboe, Bart Van Assche, kbusch, Damien Le Moal,
	Amir Goldstein, josef, Martin K. Petersen, daniel,
	Christoph Hellwig, Dan Williams, jack, Jason Gunthorpe,
	Chuck Lever

On 27.02.24 12:30, Leon Romanovsky wrote:
> On Tue, Feb 27, 2024 at 08:17:27AM +0000, Chaitanya Kulkarni wrote:
>> Hi,
> 
> <...>
> 
>> In order to create a good platform for a concrete and meaningful
>> discussion at LSFMM 24, we plan to post an RFC within the next two weeks.
> 
> The code can be found here https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=dma-split

Thanks a lot. I will delve into it. An interesting topic.

Zhu Yanjun

> 
> Thanks
> 
>>
>> Required Attendees list :-
>>
>> Christoph Hellwig
>> Jason Gunthorpe
>> Jens Axboe
>> Chuck Lever
>> David Howells
>> Keith Busch
>> Bart Van Assche
>> Damien Le Moal
>> Martin Petersen
>>
>> -ck
>>
>> [1]
>> https://lore.kernel.org/all/169772852492.5232.17148564580779995849.stgit@klimt.1015granger.net
>> [2] https://lore.kernel.org/linux-iommu/20200708065014.GA5694@lst.de/
>>
>>
>>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM/BPF TOPIC] [LSF/MM/BPF ATTEND] : Two stage IOMMU DMA mapping operations
  2024-03-03 16:43   ` Zhu Yanjun
@ 2024-03-04  2:27     ` Zhu Yanjun
  0 siblings, 0 replies; 5+ messages in thread
From: Zhu Yanjun @ 2024-03-04  2:27 UTC (permalink / raw)
  To: Zhu Yanjun, Leon Romanovsky, Chaitanya Kulkarni
  Cc: lsf-pc, linux-block, linux-nvme, iommu, linux-rdma, linux-mm,
	Jens Axboe, Bart Van Assche, kbusch, Damien Le Moal,
	Amir Goldstein, josef, Martin K. Petersen, daniel,
	Christoph Hellwig, Dan Williams, jack, Jason Gunthorpe,
	Chuck Lever

在 2024/3/3 17:43, Zhu Yanjun 写道:
> On 27.02.24 12:30, Leon Romanovsky wrote:
>> On Tue, Feb 27, 2024 at 08:17:27AM +0000, Chaitanya Kulkarni wrote:
>>> Hi,
>>
>> <...>
>>
>>> In order to create a good platform for a concrete and meaningful
>>> discussion at LSFMM 24, we plan to post an RFC within the next two 
>>> weeks.
>>
>> The code can be found here 
>> https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=dma-split
> 
> Thanks a lot. I will delve into it. An interesting topic.

The commits should be the followings. I am interested in them.

5d8f8f35859c (HEAD -> dma-split, origin/dma-split) cover-letter: Split 
IOMMU DMA mapping operation to two steps
3beffcde0c12 vfio/mlx5: Convert vfio to use DMA link API
acdfef1ccbcb vfio/mlx5: Explicitly store page list
f16314362e66 vfio/mlx5: Rewrite create mkey flow to allow better code reuse
763e753cd6ed vfio/mlx5: Explicitly use number of pages instead of 
allocated length
7f58ebf0cfc4 RDMA/umem: Prevent UMEM ODP creation with SWIOTLB
f1c687fde096 RDMA/core: Separate DMA mapping to caching IOVA and page 
linkage
ffc81619c60d RDMA/umem: Store ODP access mask information in PFN
67038d9e24fd RDMA/umem: Preallocate and cache IOVA for UMEM ODP
ce141bccd409 iommu/dma: Implement link/unlink page callbacks
1dd12d4a44d1 iommu/dma: Prepare map/unmap page functions to receive IOVA
b9714667f54f iommu/dma: Provide an interface to allow preallocate IOVA
21dbfc7fc2f1 dma-mapping: provide callbacks to link/unlink pages to 
specific IOVA
52689a26b87a dma-mapping: provide an interface to allocate IOVA
34f8a8baecaa mm/hmm: let users to tag specific PFNs

Zhu Yanjun

> 
> Zhu Yanjun
> 
>>
>> Thanks
>>
>>>
>>> Required Attendees list :-
>>>
>>> Christoph Hellwig
>>> Jason Gunthorpe
>>> Jens Axboe
>>> Chuck Lever
>>> David Howells
>>> Keith Busch
>>> Bart Van Assche
>>> Damien Le Moal
>>> Martin Petersen
>>>
>>> -ck
>>>
>>> [1]
>>> https://lore.kernel.org/all/169772852492.5232.17148564580779995849.stgit@klimt.1015granger.net
>>> [2] https://lore.kernel.org/linux-iommu/20200708065014.GA5694@lst.de/
>>>
>>>
>>>
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM/BPF TOPIC] [LSF/MM/BPF ATTEND] : Two stage IOMMU DMA mapping operations
  2024-02-27  8:17 [LSF/MM/BPF TOPIC] [LSF/MM/BPF ATTEND] : Two stage IOMMU DMA mapping operations Chaitanya Kulkarni
  2024-02-27 11:30 ` Leon Romanovsky
@ 2024-03-05 13:03 ` Jason Gunthorpe
  1 sibling, 0 replies; 5+ messages in thread
From: Jason Gunthorpe @ 2024-03-05 13:03 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: lsf-pc, linux-block, linux-nvme, iommu, linux-rdma, linux-mm,
	Jens Axboe, Bart Van Assche, kbusch, Damien Le Moal,
	Amir Goldstein, josef, Martin K. Petersen, daniel,
	Christoph Hellwig, Dan Williams, jack, Leon Romanovsky

On Tue, Feb 27, 2024 at 08:17:27AM +0000, Chaitanya Kulkarni wrote:
> Hi,
> 
> * Problem Statement :-
> -------------------------------------------------------------------------
> The existing IOMMU DMA mapping operation is performed in two steps at the
> same time (one-shot):
> 1. Allocates IOVA space.
> 2. Actually maps DMA pages to that space.
> For example, map scatter-gather list:

For clarity, this has come out of last years topic on the "physr" - we
agreed to a general direction where instead of adding a parallel DMA
API surface for a new scatterlist alternative we'd improve the DMA API
so callers can efficiently bring their own data structure.

https://lore.kernel.org/linux-rdma/Y8v+qVZ8OmodOCQ9@nvidia.com/

Jason

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-03-05 13:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-27  8:17 [LSF/MM/BPF TOPIC] [LSF/MM/BPF ATTEND] : Two stage IOMMU DMA mapping operations Chaitanya Kulkarni
2024-02-27 11:30 ` Leon Romanovsky
2024-03-03 16:43   ` Zhu Yanjun
2024-03-04  2:27     ` Zhu Yanjun
2024-03-05 13:03 ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).