linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chaitanya Kulkarni <chaitanyak@nvidia.com>
To: "lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	linux-rdma <linux-rdma@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Cc: Jens Axboe <axboe@kernel.dk>,
	Bart Van Assche <bvanassche@acm.org>,
	"kbusch@kernel.org" <kbusch@kernel.org>,
	Damien Le Moal <damien.lemoal@opensource.wdc.com>,
	Amir Goldstein <amir73il@gmail.com>,
	"josef@toxicpanda.com" <josef@toxicpanda.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>,
	Christoph Hellwig <hch@lst.de>,
	Dan Williams <dan.j.williams@intel.com>,
	"jack@suse.com" <jack@suse.com>,
	Leon Romanovsky <leonro@nvidia.com>,
	Jason Gunthorpe <jgg@nvidia.com>
Subject: [LSF/MM/BPF TOPIC] [LSF/MM/BPF ATTEND] : Two stage IOMMU DMA mapping operations
Date: Tue, 27 Feb 2024 08:17:27 +0000	[thread overview]
Message-ID: <97f385db-42c9-4c04-8fba-9b1ba8ffc525@nvidia.com> (raw)

Hi,

* Problem Statement :-
-------------------------------------------------------------------------
The existing IOMMU DMA mapping operation is performed in two steps at the
same time (one-shot):
1. Allocates IOVA space.
2. Actually maps DMA pages to that space.
For example, map scatter-gather list:
dma_map_sg_attrs()
   __dma_map_sg_attrs
     ops->map_sg()
       iommu_dma_map_sg()
         Calculate length of IOVA space that  is needed

         /* ####### step one allocate IOVA space ####### */
         iommu_dma_alloc_iova()

         /* ####### step two actually map DMA Pages ####### */
         iommu_map_sg()
           for each entry in sg list()
             __iommu_map()
               iommu_domain_ops->map_pages()

This one-shot operation works perfectly for non-complex scenarios where
callers use the existing DMA API in the control path when they setup
hardware.

However, in more complex scenarios, when DMA mapping is needed in the
data path and especially when some sort of specific intermediary
datatype is involved (sg list), this one-shot approach:

1. Forces developers to introduce new DMA APIs for specific datatype,
    e.g., Existing scatter-gather mapping functions in dma mapping
    existing subsystems :-

    dma_map_sgtable()
      __dma_map_sg_attrs()
    dma_unmap_sg_attrs()
    blk_rq_map_sg()
      __blk_rq_map_sg()
      __blk_bvec_map_sg()
      __blk_bios_map_sg()
    blk_bvec_map_sg()

    OR

    Latest Chuck's RFC series [1] aims to incorporate biovec-related
    DMA mapping (which expands bio_vec with DMA addresses). Probably,
    struct folio will also require it.

2. Creates dependencies on a data type, forcing certain intermediary
    data type allocation/de-allocation and page-to-data-type mapping
    and unmapping in the fast path (submission or completion).

* Proposed approach and discussion points :-
-------------------------------------------------------------------------

Instead of teaching DMA APIs to know about specific datatypes & creating
a dependency on it, that may add performance overhead with mapping and
allocation, we propose to separate the existing DMA mapping routine into
two steps where:

Step 1 : Provide an option to API users (subsystems) to perform all
          calculations internally in-advance.
Step 2 : Map pages when they are needed.

These advanced DMA mapping APIs are needed to calculate the IOVA size to
allocate as one chunk and a combination of offset calculations to know
which part of IOVA to be mapped to which page.

The new API will also allow us to remove the dependency on the sg list as
discussed previously in [2].

The main advantages of this approach as it is seen in upcoming RFC
series are:

1. Simplified & increased performance in page fault handling for
    On-Demand-Paging (ODP) mode for RDMA.
2. Reduced memory footprint for VFIO PCI live migration code.
3. Reduced overhead of intermediary SG table manipulation in the fast
    path for storage drivers where block layer requests are mapped onto
    sg table and then sg table is mapped onto DMA :-
    xxx_queue_rq()
     allocate sg table
     blk_rq_map_sg()
       merge and maps bvecs to sg
     dma_map_sgtable()
      maps pages in sg to DMA.

In order to create a good platform for a concrete and meaningful
discussion at LSFMM 24, we plan to post an RFC within the next two weeks.

Required Attendees list :-

Christoph Hellwig
Jason Gunthorpe
Jens Axboe
Chuck Lever
David Howells
Keith Busch
Bart Van Assche
Damien Le Moal
Martin Petersen

-ck

[1] 
https://lore.kernel.org/all/169772852492.5232.17148564580779995849.stgit@klimt.1015granger.net
[2] https://lore.kernel.org/linux-iommu/20200708065014.GA5694@lst.de/




             reply	other threads:[~2024-02-27  8:17 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-27  8:17 Chaitanya Kulkarni [this message]
2024-02-27 11:30 ` [LSF/MM/BPF TOPIC] [LSF/MM/BPF ATTEND] : Two stage IOMMU DMA mapping operations Leon Romanovsky
2024-03-03 16:43   ` Zhu Yanjun
2024-03-04  2:27     ` Zhu Yanjun
2024-03-05 13:03 ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=97f385db-42c9-4c04-8fba-9b1ba8ffc525@nvidia.com \
    --to=chaitanyak@nvidia.com \
    --cc=amir73il@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=damien.lemoal@opensource.wdc.com \
    --cc=dan.j.williams@intel.com \
    --cc=daniel@iogearbox.net \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux.dev \
    --cc=jack@suse.com \
    --cc=jgg@nvidia.com \
    --cc=josef@toxicpanda.com \
    --cc=kbusch@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).