From: Chaitanya Kulkarni <chaitanyak@nvidia.com>
To: "lsf-pc@lists.linux-foundation.org"
<lsf-pc@lists.linux-foundation.org>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
linux-rdma <linux-rdma@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>
Cc: Jens Axboe <axboe@kernel.dk>,
Bart Van Assche <bvanassche@acm.org>,
"kbusch@kernel.org" <kbusch@kernel.org>,
Damien Le Moal <damien.lemoal@opensource.wdc.com>,
Amir Goldstein <amir73il@gmail.com>,
"josef@toxicpanda.com" <josef@toxicpanda.com>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
"daniel@iogearbox.net" <daniel@iogearbox.net>,
Christoph Hellwig <hch@lst.de>,
Dan Williams <dan.j.williams@intel.com>,
"jack@suse.com" <jack@suse.com>,
Leon Romanovsky <leonro@nvidia.com>,
Jason Gunthorpe <jgg@nvidia.com>
Subject: [LSF/MM/BPF TOPIC] [LSF/MM/BPF ATTEND] : Two stage IOMMU DMA mapping operations
Date: Tue, 27 Feb 2024 08:17:27 +0000 [thread overview]
Message-ID: <97f385db-42c9-4c04-8fba-9b1ba8ffc525@nvidia.com> (raw)
Hi,
* Problem Statement :-
-------------------------------------------------------------------------
The existing IOMMU DMA mapping operation is performed in two steps at the
same time (one-shot):
1. Allocates IOVA space.
2. Actually maps DMA pages to that space.
For example, map scatter-gather list:
dma_map_sg_attrs()
__dma_map_sg_attrs
ops->map_sg()
iommu_dma_map_sg()
Calculate length of IOVA space that is needed
/* ####### step one allocate IOVA space ####### */
iommu_dma_alloc_iova()
/* ####### step two actually map DMA Pages ####### */
iommu_map_sg()
for each entry in sg list()
__iommu_map()
iommu_domain_ops->map_pages()
This one-shot operation works perfectly for non-complex scenarios where
callers use the existing DMA API in the control path when they setup
hardware.
However, in more complex scenarios, when DMA mapping is needed in the
data path and especially when some sort of specific intermediary
datatype is involved (sg list), this one-shot approach:
1. Forces developers to introduce new DMA APIs for specific datatype,
e.g., Existing scatter-gather mapping functions in dma mapping
existing subsystems :-
dma_map_sgtable()
__dma_map_sg_attrs()
dma_unmap_sg_attrs()
blk_rq_map_sg()
__blk_rq_map_sg()
__blk_bvec_map_sg()
__blk_bios_map_sg()
blk_bvec_map_sg()
OR
Latest Chuck's RFC series [1] aims to incorporate biovec-related
DMA mapping (which expands bio_vec with DMA addresses). Probably,
struct folio will also require it.
2. Creates dependencies on a data type, forcing certain intermediary
data type allocation/de-allocation and page-to-data-type mapping
and unmapping in the fast path (submission or completion).
* Proposed approach and discussion points :-
-------------------------------------------------------------------------
Instead of teaching DMA APIs to know about specific datatypes & creating
a dependency on it, that may add performance overhead with mapping and
allocation, we propose to separate the existing DMA mapping routine into
two steps where:
Step 1 : Provide an option to API users (subsystems) to perform all
calculations internally in-advance.
Step 2 : Map pages when they are needed.
These advanced DMA mapping APIs are needed to calculate the IOVA size to
allocate as one chunk and a combination of offset calculations to know
which part of IOVA to be mapped to which page.
The new API will also allow us to remove the dependency on the sg list as
discussed previously in [2].
The main advantages of this approach as it is seen in upcoming RFC
series are:
1. Simplified & increased performance in page fault handling for
On-Demand-Paging (ODP) mode for RDMA.
2. Reduced memory footprint for VFIO PCI live migration code.
3. Reduced overhead of intermediary SG table manipulation in the fast
path for storage drivers where block layer requests are mapped onto
sg table and then sg table is mapped onto DMA :-
xxx_queue_rq()
allocate sg table
blk_rq_map_sg()
merge and maps bvecs to sg
dma_map_sgtable()
maps pages in sg to DMA.
In order to create a good platform for a concrete and meaningful
discussion at LSFMM 24, we plan to post an RFC within the next two weeks.
Required Attendees list :-
Christoph Hellwig
Jason Gunthorpe
Jens Axboe
Chuck Lever
David Howells
Keith Busch
Bart Van Assche
Damien Le Moal
Martin Petersen
-ck
[1]
https://lore.kernel.org/all/169772852492.5232.17148564580779995849.stgit@klimt.1015granger.net
[2] https://lore.kernel.org/linux-iommu/20200708065014.GA5694@lst.de/
next reply other threads:[~2024-02-27 8:17 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-27 8:17 Chaitanya Kulkarni [this message]
2024-02-27 11:30 ` [LSF/MM/BPF TOPIC] [LSF/MM/BPF ATTEND] : Two stage IOMMU DMA mapping operations Leon Romanovsky
2024-03-03 16:43 ` Zhu Yanjun
2024-03-04 2:27 ` Zhu Yanjun
2024-03-05 13:03 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=97f385db-42c9-4c04-8fba-9b1ba8ffc525@nvidia.com \
--to=chaitanyak@nvidia.com \
--cc=amir73il@gmail.com \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=damien.lemoal@opensource.wdc.com \
--cc=dan.j.williams@intel.com \
--cc=daniel@iogearbox.net \
--cc=hch@lst.de \
--cc=iommu@lists.linux.dev \
--cc=jack@suse.com \
--cc=jgg@nvidia.com \
--cc=josef@toxicpanda.com \
--cc=kbusch@kernel.org \
--cc=leonro@nvidia.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-rdma@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=martin.petersen@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).