From: Lu Baolu <baolu.lu@linux.intel.com>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: Robin Murphy <robin.murphy@arm.com>,
iommu@lists.linux-foundation.org, Will Deacon <will@kernel.org>
Subject: [RFT PATCH 3/3] iommu/vt-d: Add iotlb_sync_map callback
Date: Mon, 25 Jan 2021 10:38:58 +0800 [thread overview]
Message-ID: <20210125023858.570175-4-baolu.lu@linux.intel.com> (raw)
In-Reply-To: <20210125023858.570175-1-baolu.lu@linux.intel.com>
Some Intel VT-d hardware implementations don't support memory coherency
for page table walk (presented by the Page-Walk-coherency bit in the
ecap register), so that software must flush the corresponding CPU cache
lines explicitly after each page table entry update.
The iommu_map_sg() code iterates through the given scatter-gather list
and invokes iommu_map() for each element in the scatter-gather list,
which calls into the vendor IOMMU driver through iommu_ops callback. As
the result, a single sg mapping may lead to multiple cache line flushes,
which leads to the degradation of I/O performance after the commit
<c588072bba6b5> ("iommu/vt-d: Convert intel iommu driver to the iommu
ops").
Fix this by adding iotlb_sync_map callback and centralizing the clflush
operations after all sg mappings.
Fixes: c588072bba6b5 ("iommu/vt-d: Convert intel iommu driver to the iommu ops")
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/linux-iommu/D81314ED-5673-44A6-B597-090E3CB83EB0@oracle.com/
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Robin Murphy <robin.murphy@arm.com>
---
drivers/iommu/intel/iommu.c | 86 +++++++++++++++++++++++++------------
1 file changed, 58 insertions(+), 28 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index f665322a0991..f5a236e63ded 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2383,34 +2383,14 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
* recalculate 'pte' and switch back to smaller pages for the
* end of the mapping, if the trailing size is not enough to
* use another superpage (i.e. nr_pages < lvl_pages).
+ *
+ * We leave clflush for the leaf pte changes to iotlb_sync_map()
+ * callback.
*/
pte++;
if (!nr_pages || first_pte_in_page(pte) ||
- (largepage_lvl > 1 && nr_pages < lvl_pages)) {
- domain_flush_cache(domain, first_pte,
- (void *)pte - (void *)first_pte);
+ (largepage_lvl > 1 && nr_pages < lvl_pages))
pte = NULL;
- }
- }
-
- return 0;
-}
-
-static int
-domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
- unsigned long phys_pfn, unsigned long nr_pages, int prot)
-{
- int iommu_id, ret;
- struct intel_iommu *iommu;
-
- /* Do the real mapping first */
- ret = __domain_mapping(domain, iov_pfn, phys_pfn, nr_pages, prot);
- if (ret)
- return ret;
-
- for_each_domain_iommu(iommu_id, domain) {
- iommu = g_iommus[iommu_id];
- __mapping_notify_one(iommu, domain, iov_pfn, nr_pages);
}
return 0;
@@ -4943,7 +4923,6 @@ static int intel_iommu_map(struct iommu_domain *domain,
struct dmar_domain *dmar_domain = to_dmar_domain(domain);
u64 max_addr;
int prot = 0;
- int ret;
if (iommu_prot & IOMMU_READ)
prot |= DMA_PTE_READ;
@@ -4969,9 +4948,8 @@ static int intel_iommu_map(struct iommu_domain *domain,
/* Round up size to next multiple of PAGE_SIZE, if it and
the low bits of hpa would take us onto the next page */
size = aligned_nrpages(hpa, size);
- ret = domain_mapping(dmar_domain, iova >> VTD_PAGE_SHIFT,
- hpa >> VTD_PAGE_SHIFT, size, prot);
- return ret;
+ return __domain_mapping(dmar_domain, iova >> VTD_PAGE_SHIFT,
+ hpa >> VTD_PAGE_SHIFT, size, prot);
}
static size_t intel_iommu_unmap(struct iommu_domain *domain,
@@ -5478,6 +5456,57 @@ static bool risky_device(struct pci_dev *pdev)
return false;
}
+static void clflush_sync_map(struct dmar_domain *domain, unsigned long clf_pfn,
+ unsigned long clf_pages)
+{
+ struct dma_pte *first_pte = NULL, *pte = NULL;
+ unsigned long lvl_pages = 0;
+ int level = 0;
+
+ while (clf_pages > 0) {
+ if (!pte) {
+ level = 0;
+ pte = pfn_to_dma_pte(domain, clf_pfn, &level);
+ if (WARN_ON(!pte))
+ return;
+ first_pte = pte;
+ lvl_pages = lvl_to_nr_pages(level);
+ }
+
+ if (WARN_ON(!lvl_pages || clf_pages < lvl_pages))
+ return;
+
+ clf_pages -= lvl_pages;
+ clf_pfn += lvl_pages;
+ pte++;
+
+ if (!clf_pages || first_pte_in_page(pte) ||
+ (level > 1 && clf_pages < lvl_pages)) {
+ domain_flush_cache(domain, first_pte,
+ (void *)pte - (void *)first_pte);
+ pte = NULL;
+ }
+ }
+}
+
+static void intel_iommu_iotlb_sync_map(struct iommu_domain *domain,
+ unsigned long iova, size_t size)
+{
+ struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+ unsigned long pages = aligned_nrpages(iova, size);
+ unsigned long pfn = iova >> VTD_PAGE_SHIFT;
+ struct intel_iommu *iommu;
+ int iommu_id;
+
+ if (!dmar_domain->iommu_coherency)
+ clflush_sync_map(dmar_domain, pfn, pages);
+
+ for_each_domain_iommu(iommu_id, dmar_domain) {
+ iommu = g_iommus[iommu_id];
+ __mapping_notify_one(iommu, dmar_domain, pfn, pages);
+ }
+}
+
const struct iommu_ops intel_iommu_ops = {
.capable = intel_iommu_capable,
.domain_alloc = intel_iommu_domain_alloc,
@@ -5490,6 +5519,7 @@ const struct iommu_ops intel_iommu_ops = {
.aux_detach_dev = intel_iommu_aux_detach_device,
.aux_get_pasid = intel_iommu_aux_get_pasid,
.map = intel_iommu_map,
+ .iotlb_sync_map = intel_iommu_iotlb_sync_map,
.unmap = intel_iommu_unmap,
.flush_iotlb_all = intel_flush_iotlb_all,
.iotlb_sync = intel_iommu_tlb_sync,
--
2.25.1
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
next prev parent reply other threads:[~2021-01-25 2:47 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-25 2:38 [RFT PATCH 0/3] Performance regression noted in v5.11-rc after c062db039f40 Lu Baolu
2021-01-25 2:38 ` [RFT PATCH 1/3] iommu: Move iotlb_sync_map out from __iommu_map Lu Baolu
2021-01-25 2:38 ` [RFT PATCH 2/3] iommu: Add iova and size as parameters in iotlb_sync_map Lu Baolu
2021-01-25 2:38 ` Lu Baolu [this message]
2021-01-25 17:39 ` [RFT PATCH 0/3] Performance regression noted in v5.11-rc after c062db039f40 Chuck Lever
2021-01-25 19:31 ` Chuck Lever
2021-01-26 6:18 ` Lu Baolu
2021-01-26 15:52 ` Chuck Lever
2021-01-27 1:53 ` Lu Baolu
2021-01-27 2:58 ` Chuck Lever
2021-01-26 16:05 ` Robin Murphy
2021-01-26 17:32 ` Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210125023858.570175-4-baolu.lu@linux.intel.com \
--to=baolu.lu@linux.intel.com \
--cc=chuck.lever@oracle.com \
--cc=iommu@lists.linux-foundation.org \
--cc=robin.murphy@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).