linux-renesas-soc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v5 0/8] treewide: improve R-Car SDHI performance
@ 2019-06-05 11:11 Yoshihiro Shimoda
  2019-06-05 11:11 ` [RFC PATCH v5 1/8] dma-mapping: add a device driver helper for iova contiguous Yoshihiro Shimoda
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Yoshihiro Shimoda @ 2019-06-05 11:11 UTC (permalink / raw)
  To: ulf.hansson, wsa+renesas, hch, m.szyprowski, robin.murphy, joro
  Cc: linux-mmc, iommu, linux-renesas-soc, Yoshihiro Shimoda

This patch series is based on iommu.git / next branch.

Since SDHI host internal DMAC of the R-Car Gen3 cannot handle two or
more segments, the performance rate (especially, eMMC HS400 reading)
is not good. However, if IOMMU is enabled on the DMAC, since IOMMU will
map multiple scatter gather buffers as one contignous iova, the DMAC can
handle the iova as well and then the performance rate is possible to
improve. In fact, I have measured the performance by using bonnie++,
"Sequential Input - block" rate was improved on r8a7795.

To achieve this, this patch series modifies DMA MAPPING and IOMMU
subsystem at first. Since I'd like to get any feedback from each
subsystem whether this way is acceptable for upstream, I submit it
to treewide with RFC.

Changes from v4:
 - [DMA MAPPING] Add a new device_dma_parameters for iova contiguous.
 - [IOMMU] Add a new capable for "merging" segments.
 - [IOMMU] Add a capable ops into the ipmmu-vmsa driver.
 - [MMC] Sort headers in renesas_sdhi_core.c.
 - [MMC] Remove the following codes that made on v3 that can be achieved by
	 DMA MAPPING and IOMMU subsystem:
 -- Check if R-Car Gen3 IPMMU is used or not on patch 3.
 -- Check if all multiple segment buffers are aligned to PAGE_SIZE on patch 3.
https://patchwork.kernel.org/project/linux-renesas-soc/list/?series=125593

Changes from v3:
 - Use a helper function device_iommu_mapped on patch 1 and 3.
 - Check if R-Car Gen3 IPMMU is used or not on patch 3.
 - Check if all multiple segment buffers are aligned to PAGE_SIZE on patch 3.
 - Add Reviewed-by Wolfram-san on patch 1 and 2. Note that I also got his
   Reviewed-by on patch 3, but I changed it from v2. So, I didn't add
   his Reviewed-by at this time.
https://patchwork.kernel.org/project/linux-renesas-soc/list/?series=120985

Changes from v2:
 - Add some conditions in the init_card().
 - Add a comment in the init_card().
 - Add definitions for some "MAX_SEGS".
https://patchwork.kernel.org/project/linux-renesas-soc/list/?series=116729

Changes from v1:
 - Remove adding init_card ops into struct tmio_mmc_dma_ops and
   tmio_mmc_host and just set init_card on renesas_sdhi_core.c.
 - Revise typos on "mmc: tmio: No memory size limitation if runs on IOMMU".
 - Add Simon-san's Reviewed-by on a tmio patch.
https://patchwork.kernel.org/project/linux-renesas-soc/list/?series=110485

*** BLURB HERE ***

Yoshihiro Shimoda (8):
  dma-mapping: add a device driver helper for iova contiguous
  iommu/dma: move iommu_dma_unmap_sg() place
  iommu: add a new capable IOMMU_CAP_MERGING
  iommu/ipmmu-vmsa: add capable ops
  mmc: tmio: No memory size limitation if runs on IOMMU
  mmc: tmio: Add a definition for default max_segs
  mmc: renesas_sdhi: sort headers
  mmc: renesas_sdhi: use multiple segments if possible

 drivers/iommu/dma-iommu.c                     | 74 +++++++++++++++++----------
 drivers/iommu/ipmmu-vmsa.c                    | 13 +++++
 drivers/mmc/host/renesas_sdhi_core.c          | 43 +++++++++++++---
 drivers/mmc/host/renesas_sdhi_internal_dmac.c |  4 ++
 drivers/mmc/host/tmio_mmc.h                   |  1 +
 drivers/mmc/host/tmio_mmc_core.c              |  7 +--
 include/linux/device.h                        |  1 +
 include/linux/dma-mapping.h                   | 16 ++++++
 include/linux/iommu.h                         |  1 +
 9 files changed, 123 insertions(+), 37 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH v5 1/8] dma-mapping: add a device driver helper for iova contiguous
  2019-06-05 11:11 [RFC PATCH v5 0/8] treewide: improve R-Car SDHI performance Yoshihiro Shimoda
@ 2019-06-05 11:11 ` Yoshihiro Shimoda
  2019-06-05 11:11 ` [RFC PATCH v5 2/8] iommu/dma: move iommu_dma_unmap_sg() place Yoshihiro Shimoda
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Yoshihiro Shimoda @ 2019-06-05 11:11 UTC (permalink / raw)
  To: ulf.hansson, wsa+renesas, hch, m.szyprowski, robin.murphy, joro
  Cc: linux-mmc, iommu, linux-renesas-soc, Yoshihiro Shimoda

This API can set a flag whether a device requires iova contiguous
strictly.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
---
 include/linux/device.h      |  1 +
 include/linux/dma-mapping.h | 16 ++++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/include/linux/device.h b/include/linux/device.h
index e85264f..a33d611 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -752,6 +752,7 @@ struct device_dma_parameters {
 	 */
 	unsigned int max_segment_size;
 	unsigned long segment_boundary_mask;
+	bool iova_contiguous;
 };
 
 /**
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 6309a72..cdb4e75 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -729,6 +729,22 @@ static inline int dma_set_seg_boundary(struct device *dev, unsigned long mask)
 	return -EIO;
 }
 
+static inline int dma_get_iova_contiguous(struct device *dev)
+{
+	if (dev->dma_parms)
+		return dev->dma_parms->iova_contiguous;
+	return false;
+}
+
+static inline int dma_set_iova_contiguous(struct device *dev, bool contiguous)
+{
+	if (dev->dma_parms) {
+		dev->dma_parms->iova_contiguous = contiguous;
+		return 0;
+	}
+	return -EIO;
+}
+
 #ifndef dma_max_pfn
 static inline unsigned long dma_max_pfn(struct device *dev)
 {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v5 2/8] iommu/dma: move iommu_dma_unmap_sg() place
  2019-06-05 11:11 [RFC PATCH v5 0/8] treewide: improve R-Car SDHI performance Yoshihiro Shimoda
  2019-06-05 11:11 ` [RFC PATCH v5 1/8] dma-mapping: add a device driver helper for iova contiguous Yoshihiro Shimoda
@ 2019-06-05 11:11 ` Yoshihiro Shimoda
  2019-06-05 11:11 ` [RFC PATCH v5 3/8] iommu: add a new capable IOMMU_CAP_MERGING Yoshihiro Shimoda
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Yoshihiro Shimoda @ 2019-06-05 11:11 UTC (permalink / raw)
  To: ulf.hansson, wsa+renesas, hch, m.szyprowski, robin.murphy, joro
  Cc: linux-mmc, iommu, linux-renesas-soc, Yoshihiro Shimoda

iommu_dma_map_sg() will use the unmap function in the future. To
avoid a forward declaration, this patch move the function place.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
---
 drivers/iommu/dma-iommu.c | 48 +++++++++++++++++++++++------------------------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 0dee374..034caae 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -730,6 +730,30 @@ static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
 	__iommu_dma_unmap(dev, dma_handle, size);
 }
 
+static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
+		int nents, enum dma_data_direction dir, unsigned long attrs)
+{
+	dma_addr_t start, end;
+	struct scatterlist *tmp;
+	int i;
+
+	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+		iommu_dma_sync_sg_for_cpu(dev, sg, nents, dir);
+
+	/*
+	 * The scatterlist segments are mapped into a single
+	 * contiguous IOVA allocation, so this is incredibly easy.
+	 */
+	start = sg_dma_address(sg);
+	for_each_sg(sg_next(sg), tmp, nents - 1, i) {
+		if (sg_dma_len(tmp) == 0)
+			break;
+		sg = tmp;
+	}
+	end = sg_dma_address(sg) + sg_dma_len(sg);
+	__iommu_dma_unmap(dev, start, end - start);
+}
+
 /*
  * Prepare a successfully-mapped scatterlist to give back to the caller.
  *
@@ -887,30 +911,6 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
 	return 0;
 }
 
-static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
-		int nents, enum dma_data_direction dir, unsigned long attrs)
-{
-	dma_addr_t start, end;
-	struct scatterlist *tmp;
-	int i;
-
-	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
-		iommu_dma_sync_sg_for_cpu(dev, sg, nents, dir);
-
-	/*
-	 * The scatterlist segments are mapped into a single
-	 * contiguous IOVA allocation, so this is incredibly easy.
-	 */
-	start = sg_dma_address(sg);
-	for_each_sg(sg_next(sg), tmp, nents - 1, i) {
-		if (sg_dma_len(tmp) == 0)
-			break;
-		sg = tmp;
-	}
-	end = sg_dma_address(sg) + sg_dma_len(sg);
-	__iommu_dma_unmap(dev, start, end - start);
-}
-
 static dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v5 3/8] iommu: add a new capable IOMMU_CAP_MERGING
  2019-06-05 11:11 [RFC PATCH v5 0/8] treewide: improve R-Car SDHI performance Yoshihiro Shimoda
  2019-06-05 11:11 ` [RFC PATCH v5 1/8] dma-mapping: add a device driver helper for iova contiguous Yoshihiro Shimoda
  2019-06-05 11:11 ` [RFC PATCH v5 2/8] iommu/dma: move iommu_dma_unmap_sg() place Yoshihiro Shimoda
@ 2019-06-05 11:11 ` Yoshihiro Shimoda
  2019-06-05 12:21   ` Robin Murphy
  2019-06-05 16:17   ` Sergei Shtylyov
  2019-06-05 11:11 ` [RFC PATCH v5 4/8] iommu/ipmmu-vmsa: add capable ops Yoshihiro Shimoda
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 18+ messages in thread
From: Yoshihiro Shimoda @ 2019-06-05 11:11 UTC (permalink / raw)
  To: ulf.hansson, wsa+renesas, hch, m.szyprowski, robin.murphy, joro
  Cc: linux-mmc, iommu, linux-renesas-soc, Yoshihiro Shimoda

This patch adds a new capable IOMMU_CAP_MERGING to check whether
the IOVA would be contiguous strictly if a device requires and
the IOMMU driver has the capable.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
---
 drivers/iommu/dma-iommu.c | 26 ++++++++++++++++++++++++--
 include/linux/iommu.h     |  1 +
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 034caae..ecf1a04 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -847,11 +847,16 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
 	dma_addr_t iova;
 	size_t iova_len = 0;
 	unsigned long mask = dma_get_seg_boundary(dev);
-	int i;
+	int i, ret;
+	bool iova_contiguous = false;
 
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
 		iommu_dma_sync_sg_for_device(dev, sg, nents, dir);
 
+	if (dma_get_iova_contiguous(dev) &&
+	    iommu_capable(dev->bus, IOMMU_CAP_MERGING))
+		iova_contiguous = true;
+
 	/*
 	 * Work out how much IOVA space we need, and align the segments to
 	 * IOVA granules for the IOMMU driver to handle. With some clever
@@ -867,6 +872,13 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
 		sg_dma_len(s) = s_length;
 		s->offset -= s_iova_off;
 		s_length = iova_align(iovad, s_length + s_iova_off);
+		/*
+		 * Check whether the IOVA would be contiguous strictly if
+		 * a device requires and the IOMMU driver has the capable.
+		 */
+		if (iova_contiguous && i > 0 &&
+		    (s_iova_off || s->length != s_length))
+			return 0;
 		s->length = s_length;
 
 		/*
@@ -902,8 +914,18 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
 	if (iommu_map_sg(domain, iova, sg, nents, prot) < iova_len)
 		goto out_free_iova;
 
-	return __finalise_sg(dev, sg, nents, iova);
+	ret = __finalise_sg(dev, sg, nents, iova);
+	/*
+	 * Check whether the sg entry is single if a device requires and
+	 * the IOMMU driver has the capable.
+	 */
+	if (iova_contiguous && ret != 1)
+		goto out_unmap_sg;
+
+	return ret;
 
+out_unmap_sg:
+	iommu_dma_unmap_sg(dev, sg, nents, dir, attrs);
 out_free_iova:
 	iommu_dma_free_iova(cookie, iova, iova_len);
 out_restore_sg:
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 91af22a..f971dd3 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -104,6 +104,7 @@ enum iommu_cap {
 					   transactions */
 	IOMMU_CAP_INTR_REMAP,		/* IOMMU supports interrupt isolation */
 	IOMMU_CAP_NOEXEC,		/* IOMMU_NOEXEC flag */
+	IOMMU_CAP_MERGING,		/* IOMMU supports segments merging */
 };
 
 /*
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v5 4/8] iommu/ipmmu-vmsa: add capable ops
  2019-06-05 11:11 [RFC PATCH v5 0/8] treewide: improve R-Car SDHI performance Yoshihiro Shimoda
                   ` (2 preceding siblings ...)
  2019-06-05 11:11 ` [RFC PATCH v5 3/8] iommu: add a new capable IOMMU_CAP_MERGING Yoshihiro Shimoda
@ 2019-06-05 11:11 ` Yoshihiro Shimoda
  2019-06-05 11:11 ` [RFC PATCH v5 5/8] mmc: tmio: No memory size limitation if runs on IOMMU Yoshihiro Shimoda
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Yoshihiro Shimoda @ 2019-06-05 11:11 UTC (permalink / raw)
  To: ulf.hansson, wsa+renesas, hch, m.szyprowski, robin.murphy, joro
  Cc: linux-mmc, iommu, linux-renesas-soc, Yoshihiro Shimoda

This patch adds the .capable into iommu_ops that can merge scatter
gather segments.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
---
 drivers/iommu/ipmmu-vmsa.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 408ad0b..81170b8 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -608,6 +608,18 @@ static irqreturn_t ipmmu_irq(int irq, void *dev)
  * IOMMU Operations
  */
 
+static bool ipmmu_capable(enum iommu_cap cap)
+{
+	switch (cap) {
+	case IOMMU_CAP_MERGING:
+		return true;
+	default:
+		break;
+	}
+
+	return false;
+}
+
 static struct iommu_domain *__ipmmu_domain_alloc(unsigned type)
 {
 	struct ipmmu_vmsa_domain *domain;
@@ -950,6 +962,7 @@ static struct iommu_group *ipmmu_find_group(struct device *dev)
 }
 
 static const struct iommu_ops ipmmu_ops = {
+	.capable = ipmmu_capable,
 	.domain_alloc = ipmmu_domain_alloc,
 	.domain_free = ipmmu_domain_free,
 	.attach_dev = ipmmu_attach_device,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v5 5/8] mmc: tmio: No memory size limitation if runs on IOMMU
  2019-06-05 11:11 [RFC PATCH v5 0/8] treewide: improve R-Car SDHI performance Yoshihiro Shimoda
                   ` (3 preceding siblings ...)
  2019-06-05 11:11 ` [RFC PATCH v5 4/8] iommu/ipmmu-vmsa: add capable ops Yoshihiro Shimoda
@ 2019-06-05 11:11 ` Yoshihiro Shimoda
  2019-06-05 11:11 ` [RFC PATCH v5 6/8] mmc: tmio: Add a definition for default max_segs Yoshihiro Shimoda
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Yoshihiro Shimoda @ 2019-06-05 11:11 UTC (permalink / raw)
  To: ulf.hansson, wsa+renesas, hch, m.szyprowski, robin.murphy, joro
  Cc: linux-mmc, iommu, linux-renesas-soc, Yoshihiro Shimoda

This patch adds a condition to avoid a memory size limitation of
swiotlb if the driver runs on IOMMU.

Tested-by: Takeshi Saito <takeshi.saito.xv@renesas.com>
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
---
 drivers/mmc/host/tmio_mmc_core.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/host/tmio_mmc_core.c b/drivers/mmc/host/tmio_mmc_core.c
index 130b91c..c9f6a59 100644
--- a/drivers/mmc/host/tmio_mmc_core.c
+++ b/drivers/mmc/host/tmio_mmc_core.c
@@ -1194,9 +1194,10 @@ int tmio_mmc_host_probe(struct tmio_mmc_host *_host)
 	 * Since swiotlb has memory size limitation, this will calculate
 	 * the maximum size locally (because we don't have any APIs for it now)
 	 * and check the current max_req_size. And then, this will update
-	 * the max_req_size if needed as a workaround.
+	 * the max_req_size if needed as a workaround. However, if the driver
+	 * runs on IOMMU, this workaround isn't needed.
 	 */
-	if (swiotlb_max_segment()) {
+	if (swiotlb_max_segment() && !device_iommu_mapped(&pdev->dev)) {
 		unsigned int max_size = (1 << IO_TLB_SHIFT) * IO_TLB_SEGSIZE;
 
 		if (mmc->max_req_size > max_size)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v5 6/8] mmc: tmio: Add a definition for default max_segs
  2019-06-05 11:11 [RFC PATCH v5 0/8] treewide: improve R-Car SDHI performance Yoshihiro Shimoda
                   ` (4 preceding siblings ...)
  2019-06-05 11:11 ` [RFC PATCH v5 5/8] mmc: tmio: No memory size limitation if runs on IOMMU Yoshihiro Shimoda
@ 2019-06-05 11:11 ` Yoshihiro Shimoda
  2019-06-05 11:11 ` [RFC PATCH v5 7/8] mmc: renesas_sdhi: sort headers Yoshihiro Shimoda
  2019-06-05 11:11 ` [RFC PATCH v5 8/8] mmc: renesas_sdhi: use multiple segments if possible Yoshihiro Shimoda
  7 siblings, 0 replies; 18+ messages in thread
From: Yoshihiro Shimoda @ 2019-06-05 11:11 UTC (permalink / raw)
  To: ulf.hansson, wsa+renesas, hch, m.szyprowski, robin.murphy, joro
  Cc: linux-mmc, iommu, linux-renesas-soc, Yoshihiro Shimoda

This patch adds a definition for default max_segs to be used by other
driver (renesas_sdhi) in the future.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
---
 drivers/mmc/host/tmio_mmc.h      | 1 +
 drivers/mmc/host/tmio_mmc_core.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/host/tmio_mmc.h b/drivers/mmc/host/tmio_mmc.h
index c5ba13f..9e387be 100644
--- a/drivers/mmc/host/tmio_mmc.h
+++ b/drivers/mmc/host/tmio_mmc.h
@@ -106,6 +106,7 @@
 #define TMIO_MASK_IRQ     (TMIO_MASK_READOP | TMIO_MASK_WRITEOP | TMIO_MASK_CMD)
 
 #define TMIO_MAX_BLK_SIZE 512
+#define TMIO_DEFAULT_MAX_SEGS 32
 
 struct tmio_mmc_data;
 struct tmio_mmc_host;
diff --git a/drivers/mmc/host/tmio_mmc_core.c b/drivers/mmc/host/tmio_mmc_core.c
index c9f6a59..af1343e 100644
--- a/drivers/mmc/host/tmio_mmc_core.c
+++ b/drivers/mmc/host/tmio_mmc_core.c
@@ -1185,7 +1185,7 @@ int tmio_mmc_host_probe(struct tmio_mmc_host *_host)
 
 	mmc->caps |= MMC_CAP_4_BIT_DATA | pdata->capabilities;
 	mmc->caps2 |= pdata->capabilities2;
-	mmc->max_segs = pdata->max_segs ? : 32;
+	mmc->max_segs = pdata->max_segs ? : TMIO_DEFAULT_MAX_SEGS;
 	mmc->max_blk_size = TMIO_MAX_BLK_SIZE;
 	mmc->max_blk_count = pdata->max_blk_count ? :
 		(PAGE_SIZE / mmc->max_blk_size) * mmc->max_segs;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v5 7/8] mmc: renesas_sdhi: sort headers
  2019-06-05 11:11 [RFC PATCH v5 0/8] treewide: improve R-Car SDHI performance Yoshihiro Shimoda
                   ` (5 preceding siblings ...)
  2019-06-05 11:11 ` [RFC PATCH v5 6/8] mmc: tmio: Add a definition for default max_segs Yoshihiro Shimoda
@ 2019-06-05 11:11 ` Yoshihiro Shimoda
  2019-06-05 11:11 ` [RFC PATCH v5 8/8] mmc: renesas_sdhi: use multiple segments if possible Yoshihiro Shimoda
  7 siblings, 0 replies; 18+ messages in thread
From: Yoshihiro Shimoda @ 2019-06-05 11:11 UTC (permalink / raw)
  To: ulf.hansson, wsa+renesas, hch, m.szyprowski, robin.murphy, joro
  Cc: linux-mmc, iommu, linux-renesas-soc, Yoshihiro Shimoda

This patch ports the headers in alphabetic order to ease
the maintenance for this part.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
---
 drivers/mmc/host/renesas_sdhi_core.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/mmc/host/renesas_sdhi_core.c b/drivers/mmc/host/renesas_sdhi_core.c
index 5e9e36e..c5ee4a6 100644
--- a/drivers/mmc/host/renesas_sdhi_core.c
+++ b/drivers/mmc/host/renesas_sdhi_core.c
@@ -18,20 +18,20 @@
  *
  */
 
-#include <linux/kernel.h>
 #include <linux/clk.h>
-#include <linux/slab.h>
-#include <linux/module.h>
-#include <linux/of_device.h>
-#include <linux/platform_device.h>
+#include <linux/delay.h>
+#include <linux/kernel.h>
+#include <linux/mfd/tmio.h>
 #include <linux/mmc/host.h>
 #include <linux/mmc/slot-gpio.h>
-#include <linux/mfd/tmio.h>
-#include <linux/sh_dma.h>
-#include <linux/delay.h>
+#include <linux/module.h>
+#include <linux/of_device.h>
 #include <linux/pinctrl/consumer.h>
 #include <linux/pinctrl/pinctrl-state.h>
+#include <linux/platform_device.h>
 #include <linux/regulator/consumer.h>
+#include <linux/sh_dma.h>
+#include <linux/slab.h>
 #include <linux/sys_soc.h>
 
 #include "renesas_sdhi.h"
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH v5 8/8] mmc: renesas_sdhi: use multiple segments if possible
  2019-06-05 11:11 [RFC PATCH v5 0/8] treewide: improve R-Car SDHI performance Yoshihiro Shimoda
                   ` (6 preceding siblings ...)
  2019-06-05 11:11 ` [RFC PATCH v5 7/8] mmc: renesas_sdhi: sort headers Yoshihiro Shimoda
@ 2019-06-05 11:11 ` Yoshihiro Shimoda
  7 siblings, 0 replies; 18+ messages in thread
From: Yoshihiro Shimoda @ 2019-06-05 11:11 UTC (permalink / raw)
  To: ulf.hansson, wsa+renesas, hch, m.szyprowski, robin.murphy, joro
  Cc: linux-mmc, iommu, linux-renesas-soc, Yoshihiro Shimoda

If the IOMMU driver has a capable for merging segments to
a segment strictly, this can expose the host->mmc->max_segs with
multiple segments to a block layer by using blk_queue_max_segments()
that mmc_setup_queue() calls. Notes that an sdio card may be
possible to use multiple segments with non page aligned size, so
that this will not expose multiple segments to avoid failing
dma_map_sg() every time.

Notes that on renesas_sdhi_sys_dmac, the max_segs value will change
from 32 to 512, but the sys_dmac can handle 512 segments, so that
this init_card ops is added on "TMIO_MMC_MIN_RCAR2" environment.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
---
 drivers/mmc/host/renesas_sdhi_core.c          | 27 +++++++++++++++++++++++++++
 drivers/mmc/host/renesas_sdhi_internal_dmac.c |  4 ++++
 2 files changed, 31 insertions(+)

diff --git a/drivers/mmc/host/renesas_sdhi_core.c b/drivers/mmc/host/renesas_sdhi_core.c
index c5ee4a6..379cefa 100644
--- a/drivers/mmc/host/renesas_sdhi_core.c
+++ b/drivers/mmc/host/renesas_sdhi_core.c
@@ -20,6 +20,7 @@
 
 #include <linux/clk.h>
 #include <linux/delay.h>
+#include <linux/iommu.h>
 #include <linux/kernel.h>
 #include <linux/mfd/tmio.h>
 #include <linux/mmc/host.h>
@@ -46,6 +47,8 @@
 #define SDHI_VER_GEN3_SD	0xcc10
 #define SDHI_VER_GEN3_SDMMC	0xcd10
 
+#define SDHI_MAX_SEGS_IN_IOMMU	512
+
 struct renesas_sdhi_quirks {
 	bool hs400_disabled;
 	bool hs400_4taps;
@@ -203,6 +206,28 @@ static void renesas_sdhi_clk_disable(struct tmio_mmc_host *host)
 	clk_disable_unprepare(priv->clk_cd);
 }
 
+static void renesas_sdhi_init_card(struct mmc_host *mmc, struct mmc_card *card)
+{
+	struct tmio_mmc_host *host = mmc_priv(mmc);
+
+	/*
+	 * If the IOMMU driver has a capable for merging segments to
+	 * a segment strictly, this can expose the host->mmc->max_segs with
+	 * multiple segments to a block layer by using blk_queue_max_segments()
+	 * that mmc_setup_queue() calls. Notes that an sdio card may be
+	 * possible to use multiple segments with non page aligned size, so
+	 * that this will not expose multiple segments to avoid failing
+	 * dma_map_sg() every time.
+	 */
+	if (host->pdata->max_segs < SDHI_MAX_SEGS_IN_IOMMU &&
+	    iommu_capable(host->pdev->dev.bus, IOMMU_CAP_MERGING) &&
+	    (mmc_card_mmc(card) || mmc_card_sd(card)))
+		host->mmc->max_segs = SDHI_MAX_SEGS_IN_IOMMU;
+	else
+		host->mmc->max_segs = host->pdata->max_segs ? :
+				      TMIO_DEFAULT_MAX_SEGS;
+}
+
 static int renesas_sdhi_card_busy(struct mmc_host *mmc)
 {
 	struct tmio_mmc_host *host = mmc_priv(mmc);
@@ -726,6 +751,8 @@ int renesas_sdhi_probe(struct platform_device *pdev,
 
 	/* SDR speeds are only available on Gen2+ */
 	if (mmc_data->flags & TMIO_MMC_MIN_RCAR2) {
+		host->ops.init_card = renesas_sdhi_init_card;
+
 		/* card_busy caused issues on r8a73a4 (pre-Gen2) CD-less SDHI */
 		host->ops.card_busy = renesas_sdhi_card_busy;
 		host->ops.start_signal_voltage_switch =
diff --git a/drivers/mmc/host/renesas_sdhi_internal_dmac.c b/drivers/mmc/host/renesas_sdhi_internal_dmac.c
index 751fe91..a442f86 100644
--- a/drivers/mmc/host/renesas_sdhi_internal_dmac.c
+++ b/drivers/mmc/host/renesas_sdhi_internal_dmac.c
@@ -11,6 +11,7 @@
 #include <linux/device.h>
 #include <linux/dma-mapping.h>
 #include <linux/io-64-nonatomic-hi-lo.h>
+#include <linux/iommu.h>
 #include <linux/mfd/tmio.h>
 #include <linux/mmc/host.h>
 #include <linux/mod_devicetable.h>
@@ -337,6 +338,9 @@ static int renesas_sdhi_internal_dmac_probe(struct platform_device *pdev)
 	/* value is max of SD_SECCNT. Confirmed by HW engineers */
 	dma_set_max_seg_size(dev, 0xffffffff);
 
+	if (iommu_capable(dev->bus, IOMMU_CAP_MERGING))
+		dma_set_iova_contiguous(dev, true);
+
 	return renesas_sdhi_probe(pdev, &renesas_sdhi_internal_dmac_dma_ops);
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v5 3/8] iommu: add a new capable IOMMU_CAP_MERGING
  2019-06-05 11:11 ` [RFC PATCH v5 3/8] iommu: add a new capable IOMMU_CAP_MERGING Yoshihiro Shimoda
@ 2019-06-05 12:21   ` Robin Murphy
  2019-06-05 12:38     ` Christoph Hellwig
  2019-06-06  5:53     ` Yoshihiro Shimoda
  2019-06-05 16:17   ` Sergei Shtylyov
  1 sibling, 2 replies; 18+ messages in thread
From: Robin Murphy @ 2019-06-05 12:21 UTC (permalink / raw)
  To: Yoshihiro Shimoda, ulf.hansson, wsa+renesas, hch, m.szyprowski, joro
  Cc: linux-mmc, iommu, linux-renesas-soc

On 05/06/2019 12:11, Yoshihiro Shimoda wrote:
> This patch adds a new capable IOMMU_CAP_MERGING to check whether
> the IOVA would be contiguous strictly if a device requires and
> the IOMMU driver has the capable.
> 
> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> ---
>   drivers/iommu/dma-iommu.c | 26 ++++++++++++++++++++++++--
>   include/linux/iommu.h     |  1 +
>   2 files changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 034caae..ecf1a04 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -847,11 +847,16 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>   	dma_addr_t iova;
>   	size_t iova_len = 0;
>   	unsigned long mask = dma_get_seg_boundary(dev);
> -	int i;
> +	int i, ret;
> +	bool iova_contiguous = false;
>   
>   	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
>   		iommu_dma_sync_sg_for_device(dev, sg, nents, dir);
>   
> +	if (dma_get_iova_contiguous(dev) &&
> +	    iommu_capable(dev->bus, IOMMU_CAP_MERGING))
> +		iova_contiguous = true;
> +
>   	/*
>   	 * Work out how much IOVA space we need, and align the segments to
>   	 * IOVA granules for the IOMMU driver to handle. With some clever
> @@ -867,6 +872,13 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>   		sg_dma_len(s) = s_length;
>   		s->offset -= s_iova_off;
>   		s_length = iova_align(iovad, s_length + s_iova_off);
> +		/*
> +		 * Check whether the IOVA would be contiguous strictly if
> +		 * a device requires and the IOMMU driver has the capable.
> +		 */
> +		if (iova_contiguous && i > 0 &&
> +		    (s_iova_off || s->length != s_length))
> +			return 0;
>   		s->length = s_length;
>   
>   		/*
> @@ -902,8 +914,18 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>   	if (iommu_map_sg(domain, iova, sg, nents, prot) < iova_len)
>   		goto out_free_iova;
>   
> -	return __finalise_sg(dev, sg, nents, iova);
> +	ret = __finalise_sg(dev, sg, nents, iova);
> +	/*
> +	 * Check whether the sg entry is single if a device requires and
> +	 * the IOMMU driver has the capable.
> +	 */
> +	if (iova_contiguous && ret != 1)
> +		goto out_unmap_sg;

I don't see that just failing really gives this option any value. 
Clearly the MMC driver has to do *something* to handle the failure (plus 
presumably the case of not having IOMMU DMA ops at all), which begs the 
question of why it couldn't just do whatever that is anyway, without all 
this infrastructure. For starters, it would be a far simpler and less 
invasive patch:

	if (dma_map_sg(...) > 1) {
		dma_unmap_sg(...);
		/* split into multiple requests and try again */
	}

But then it would make even more sense to just have the driver be 
proactive about its special requirement in the first place, and simply 
validate the list before it even tries to map it:

	for_each_sg(sgl, sg, n, i)
		if ((i > 0 && sg->offset % PAGE_SIZE) ||
		    (i < n - 1 && sg->length % PAGE_SIZE))
			/* segment will not be mergeable */

For reference, I think v4l2 and possibly some areas of DRM already do 
something vaguely similar to judge whether they get contiguous buffers 
or not.

> +
> +	return ret;
>   
> +out_unmap_sg:
> +	iommu_dma_unmap_sg(dev, sg, nents, dir, attrs);
>   out_free_iova:
>   	iommu_dma_free_iova(cookie, iova, iova_len);
>   out_restore_sg:
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 91af22a..f971dd3 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -104,6 +104,7 @@ enum iommu_cap {
>   					   transactions */
>   	IOMMU_CAP_INTR_REMAP,		/* IOMMU supports interrupt isolation */
>   	IOMMU_CAP_NOEXEC,		/* IOMMU_NOEXEC flag */
> +	IOMMU_CAP_MERGING,		/* IOMMU supports segments merging */

This isn't a 'capability' of the IOMMU - "segment merging" equates to 
just remapping pages, and there's already a fundamental assumption that 
IOMMUs are capable of that. Plus it's very much a DMA API concept, so 
hardly belongs in the IOMMU API anyway.

All in all, I'm struggling to see the point of this. Although it's not a 
DMA API guarantee, iommu-dma already merges scatterlists as aggressively 
as it is allowed to, and will continue to do so for the foreseeable 
future (since it avoids considerable complication in the IOVA 
allocation), so if you want to make sure iommu_dma_map_sg() merges an 
entire list, just don't give it a non-mergeable list. And if you still 
really really want dma_map_sg() to have a behaviour of "merge to a 
single segment or fail", then that should at least be a DMA API 
attribute, which could in principle be honoured by bounce-buffering 
implementations as well.

And if the problem is really that you're not getting merging because of 
exposing the wrong parameters to the DMA API and/or the block layer, or 
that you just can't quite express your requirement to the block layer in 
the first place, then that should really be tackled at the source rather 
than worked around further down in the stack.

Robin.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v5 3/8] iommu: add a new capable IOMMU_CAP_MERGING
  2019-06-05 12:21   ` Robin Murphy
@ 2019-06-05 12:38     ` Christoph Hellwig
  2019-06-06  6:28       ` Yoshihiro Shimoda
  2019-06-06  5:53     ` Yoshihiro Shimoda
  1 sibling, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2019-06-05 12:38 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Yoshihiro Shimoda, ulf.hansson, wsa+renesas, hch, m.szyprowski,
	joro, linux-mmc, iommu, linux-renesas-soc

On Wed, Jun 05, 2019 at 01:21:59PM +0100, Robin Murphy wrote:
> And if the problem is really that you're not getting merging because of 
> exposing the wrong parameters to the DMA API and/or the block layer, or 
> that you just can't quite express your requirement to the block layer in 
> the first place, then that should really be tackled at the source rather 
> than worked around further down in the stack.

The problem is that we need a way to communicate to the block layer
that more than a single segment is ok IFF the DMA API instance supports
merging.  And of course the answer will depend on futher parameters
like the maximum merged segment size and alignment for the segement.

We'll need some way to communicate that, but I don't really think this
is series is the way to go.

We'd really need something hanging off the device (or through a query
API) how the dma map ops implementation exposes under what circumstances
it can merge.  The driver can then communicate that to the block layer
so that the block layer doesn't split requests up when reaching the
segement limit.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v5 3/8] iommu: add a new capable IOMMU_CAP_MERGING
  2019-06-05 11:11 ` [RFC PATCH v5 3/8] iommu: add a new capable IOMMU_CAP_MERGING Yoshihiro Shimoda
  2019-06-05 12:21   ` Robin Murphy
@ 2019-06-05 16:17   ` Sergei Shtylyov
  1 sibling, 0 replies; 18+ messages in thread
From: Sergei Shtylyov @ 2019-06-05 16:17 UTC (permalink / raw)
  To: Yoshihiro Shimoda, ulf.hansson, wsa+renesas, hch, m.szyprowski,
	robin.murphy, joro
  Cc: linux-mmc, iommu, linux-renesas-soc

On 06/05/2019 02:11 PM, Yoshihiro Shimoda wrote:

> This patch adds a new capable IOMMU_CAP_MERGING to check whether
> the IOVA would be contiguous strictly if a device requires and
> the IOMMU driver has the capable.

   s/has/is/? Or capable what?

> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
> ---
>  drivers/iommu/dma-iommu.c | 26 ++++++++++++++++++++++++--
>  include/linux/iommu.h     |  1 +
>  2 files changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 034caae..ecf1a04 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
[...]
> @@ -867,6 +872,13 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>  		sg_dma_len(s) = s_length;
>  		s->offset -= s_iova_off;
>  		s_length = iova_align(iovad, s_length + s_iova_off);
> +		/*
> +		 * Check whether the IOVA would be contiguous strictly if
> +		 * a device requires and the IOMMU driver has the capable.

   Same question here...

> +		 */
> +		if (iova_contiguous && i > 0 &&
> +		    (s_iova_off || s->length != s_length))
> +			return 0;
>  		s->length = s_length;
>  
>  		/*
> @@ -902,8 +914,18 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>  	if (iommu_map_sg(domain, iova, sg, nents, prot) < iova_len)
>  		goto out_free_iova;
>  
> -	return __finalise_sg(dev, sg, nents, iova);
> +	ret = __finalise_sg(dev, sg, nents, iova);
> +	/*
> +	 * Check whether the sg entry is single if a device requires and
> +	 * the IOMMU driver has the capable.

   You  meant capability perhaps?

[...]

MBR, Sergei

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [RFC PATCH v5 3/8] iommu: add a new capable IOMMU_CAP_MERGING
  2019-06-05 12:21   ` Robin Murphy
  2019-06-05 12:38     ` Christoph Hellwig
@ 2019-06-06  5:53     ` Yoshihiro Shimoda
  1 sibling, 0 replies; 18+ messages in thread
From: Yoshihiro Shimoda @ 2019-06-06  5:53 UTC (permalink / raw)
  To: Robin Murphy
  Cc: linux-mmc, iommu, linux-renesas-soc, ulf.hansson, wsa+renesas,
	hch, m.szyprowski, joro

Hi Robin,

Thank you for your comments!

> From: Robin Murphy, Sent: Wednesday, June 5, 2019 9:22 PM
<snip>
> > @@ -902,8 +914,18 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
> >   	if (iommu_map_sg(domain, iova, sg, nents, prot) < iova_len)
> >   		goto out_free_iova;
> >
> > -	return __finalise_sg(dev, sg, nents, iova);
> > +	ret = __finalise_sg(dev, sg, nents, iova);
> > +	/*
> > +	 * Check whether the sg entry is single if a device requires and
> > +	 * the IOMMU driver has the capable.
> > +	 */
> > +	if (iova_contiguous && ret != 1)
> > +		goto out_unmap_sg;
> 
> I don't see that just failing really gives this option any value.
> Clearly the MMC driver has to do *something* to handle the failure (plus
> presumably the case of not having IOMMU DMA ops at all), which begs the
> question of why it couldn't just do whatever that is anyway, without all
> this infrastructure. For starters, it would be a far simpler and less
> invasive patch:
> 
> 	if (dma_map_sg(...) > 1) {
> 		dma_unmap_sg(...);
> 		/* split into multiple requests and try again */
> 	}

I understood it.

> But then it would make even more sense to just have the driver be
> proactive about its special requirement in the first place, and simply
> validate the list before it even tries to map it:
> 
> 	for_each_sg(sgl, sg, n, i)
> 		if ((i > 0 && sg->offset % PAGE_SIZE) ||
> 		    (i < n - 1 && sg->length % PAGE_SIZE))
> 			/* segment will not be mergeable */

In previous version, I made such a code [1].
But, I think I misunderstood Christoph's comments [2] [3].

[1]
https://patchwork.kernel.org/patch/10970047/

[2]
https://marc.info/?l=linux-renesas-soc&m=155956751811689&w=2

[3]
https://marc.info/?l=linux-renesas-soc&m=155852814607202&w=2

> For reference, I think v4l2 and possibly some areas of DRM already do
> something vaguely similar to judge whether they get contiguous buffers
> or not.

I see. I'll check these areas later.

> > +
> > +	return ret;
> >
> > +out_unmap_sg:
> > +	iommu_dma_unmap_sg(dev, sg, nents, dir, attrs);
> >   out_free_iova:
> >   	iommu_dma_free_iova(cookie, iova, iova_len);
> >   out_restore_sg:
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 91af22a..f971dd3 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -104,6 +104,7 @@ enum iommu_cap {
> >   					   transactions */
> >   	IOMMU_CAP_INTR_REMAP,		/* IOMMU supports interrupt isolation */
> >   	IOMMU_CAP_NOEXEC,		/* IOMMU_NOEXEC flag */
> > +	IOMMU_CAP_MERGING,		/* IOMMU supports segments merging */
> 
> This isn't a 'capability' of the IOMMU - "segment merging" equates to
> just remapping pages, and there's already a fundamental assumption that
> IOMMUs are capable of that. Plus it's very much a DMA API concept, so
> hardly belongs in the IOMMU API anyway.

I got it.

> All in all, I'm struggling to see the point of this. Although it's not a
> DMA API guarantee, iommu-dma already merges scatterlists as aggressively
> as it is allowed to, and will continue to do so for the foreseeable
> future (since it avoids considerable complication in the IOVA
> allocation), so if you want to make sure iommu_dma_map_sg() merges an
> entire list, just don't give it a non-mergeable list.

Thank you for the explanation. I didn't know that a driver should not
give it a non-mergeable list.

> And if you still
> really really want dma_map_sg() to have a behaviour of "merge to a
> single segment or fail", then that should at least be a DMA API
> attribute, which could in principle be honoured by bounce-buffering
> implementations as well.

I got it. For this patch series, it seems I have to modify a block layer
so that such a new DMA API is not needed though.

> And if the problem is really that you're not getting merging because of
> exposing the wrong parameters to the DMA API and/or the block layer, or
> that you just can't quite express your requirement to the block layer in
> the first place, then that should really be tackled at the source rather
> than worked around further down in the stack.

I'll reply on Christoph email about this topic later.

Best regards,
Yoshihiro Shimoda

> Robin.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [RFC PATCH v5 3/8] iommu: add a new capable IOMMU_CAP_MERGING
  2019-06-05 12:38     ` Christoph Hellwig
@ 2019-06-06  6:28       ` Yoshihiro Shimoda
  2019-06-06  7:00         ` Christoph Hellwig
  0 siblings, 1 reply; 18+ messages in thread
From: Yoshihiro Shimoda @ 2019-06-06  6:28 UTC (permalink / raw)
  To: Christoph Hellwig, Robin Murphy
  Cc: ulf.hansson, wsa+renesas, m.szyprowski, joro, linux-mmc, iommu,
	linux-renesas-soc

Hi Christoph,

Thank you for your comments!

> From: Christoph Hellwig, Sent: Wednesday, June 5, 2019 9:38 PM
> 
> On Wed, Jun 05, 2019 at 01:21:59PM +0100, Robin Murphy wrote:
> > And if the problem is really that you're not getting merging because of
> > exposing the wrong parameters to the DMA API and/or the block layer, or
> > that you just can't quite express your requirement to the block layer in
> > the first place, then that should really be tackled at the source rather
> > than worked around further down in the stack.
> 
> The problem is that we need a way to communicate to the block layer
> that more than a single segment is ok IFF the DMA API instance supports
> merging.  And of course the answer will depend on futher parameters
> like the maximum merged segment size and alignment for the segement.

I'm afraid but I don't understand why we need a way to communicate to
the block layer that more than a single segment is ok IFF the DMA API
instance supports merging.

> We'll need some way to communicate that, but I don't really think this
> is series is the way to go.

I should discard the patches 1/8 through 4/8.

> We'd really need something hanging off the device (or through a query
> API) how the dma map ops implementation exposes under what circumstances
> it can merge.  The driver can then communicate that to the block layer
> so that the block layer doesn't split requests up when reaching the
> segement limit.

The block layer already has a limit "max_segment_size" for each device so that
regardless it can/cannot merge the segments, we can use the limit.
Is my understanding incorrect?

Best regards,
Yoshihiro Shimoda


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v5 3/8] iommu: add a new capable IOMMU_CAP_MERGING
  2019-06-06  6:28       ` Yoshihiro Shimoda
@ 2019-06-06  7:00         ` Christoph Hellwig
  2019-06-07  5:41           ` Yoshihiro Shimoda
  0 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2019-06-06  7:00 UTC (permalink / raw)
  To: Yoshihiro Shimoda
  Cc: Christoph Hellwig, Robin Murphy, ulf.hansson, wsa+renesas,
	m.szyprowski, joro, linux-mmc, iommu, linux-renesas-soc

On Thu, Jun 06, 2019 at 06:28:47AM +0000, Yoshihiro Shimoda wrote:
> > The problem is that we need a way to communicate to the block layer
> > that more than a single segment is ok IFF the DMA API instance supports
> > merging.  And of course the answer will depend on futher parameters
> > like the maximum merged segment size and alignment for the segement.
> 
> I'm afraid but I don't understand why we need a way to communicate to
> the block layer that more than a single segment is ok IFF the DMA API
> instance supports merging.

Assume a device (which I think is your case) that only supports a single
segment in hardware.  In that case we set max_segments to 1 if no
IOMMU is present.  But if we have a merge capable IOMMU we can set
max_segments to unlimited (or some software limit for scatterlist
allocation), as long as we set a virt_boundary matching what the IOMMU
expects, and max_sectors_kb isn't larger than the max IOMMU mapping
size.  Now we could probably just open code this in the driver, but
I'd feel much happier having a block layer like this:

bool blk_can_use_iommu_merging(struct request_queue *q, struct device *dev)
{
	if (!IOMMU_CAN_MERGE_SEGMENTS(dev))
		return false;

	blk_queue_virt_boundary(q, IOMMU_PAGE_SIZE(dev));
	blk_queue_max_segment_size(q, IOMMU_MAX_SEGMENT_SIZE(dev));
	return true;
}

and the driver then does:

	if (blk_can_use_iommu_merging(q, dev)) {
		blk_queue_max_segments(q, MAX_SW_SEGMENTS);
		// initialize sg mempool, etc..
	}


Where the SCREAMING pseudo code calls are something we need to find a
good API for.

And thinking about it the backend doesn't need to be an iommu, swiotlb
could handle this as well, which might be interesting for devices
that need to boune buffer anyway.  IIRC mmc actually has some code
to copy multiple segments into a bounce buffer somewhere.

> The block layer already has a limit "max_segment_size" for each device so that
> regardless it can/cannot merge the segments, we can use the limit.
> Is my understanding incorrect?

Yes.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [RFC PATCH v5 3/8] iommu: add a new capable IOMMU_CAP_MERGING
  2019-06-06  7:00         ` Christoph Hellwig
@ 2019-06-07  5:41           ` Yoshihiro Shimoda
  2019-06-07  5:49             ` Christoph Hellwig
  0 siblings, 1 reply; 18+ messages in thread
From: Yoshihiro Shimoda @ 2019-06-07  5:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Robin Murphy, ulf.hansson, wsa+renesas, m.szyprowski, joro,
	linux-mmc, iommu, linux-renesas-soc

Hi Christoph,

> From: Christoph Hellwig, Sent: Thursday, June 6, 2019 4:01 PM
> 
> On Thu, Jun 06, 2019 at 06:28:47AM +0000, Yoshihiro Shimoda wrote:
> > > The problem is that we need a way to communicate to the block layer
> > > that more than a single segment is ok IFF the DMA API instance supports
> > > merging.  And of course the answer will depend on futher parameters
> > > like the maximum merged segment size and alignment for the segement.
> >
> > I'm afraid but I don't understand why we need a way to communicate to
> > the block layer that more than a single segment is ok IFF the DMA API
> > instance supports merging.
> 
> Assume a device (which I think is your case) that only supports a single
> segment in hardware.  In that case we set max_segments to 1 if no
> IOMMU is present.  But if we have a merge capable IOMMU we can set
> max_segments to unlimited (or some software limit for scatterlist
> allocation), as long as we set a virt_boundary matching what the IOMMU
> expects, and max_sectors_kb isn't larger than the max IOMMU mapping
> size.  Now we could probably just open code this in the driver, but
> I'd feel much happier having a block layer like this:

Thank you for the explanation in detail!

> bool blk_can_use_iommu_merging(struct request_queue *q, struct device *dev)
> {
> 	if (!IOMMU_CAN_MERGE_SEGMENTS(dev))
> 		return false;

As Robin mentioned, all IOMMUs can merge segments so that we don't need
this condition, IIUC. However, this should check whether the device is mapped
on iommu by using device_iommu_mapped().

> 	blk_queue_virt_boundary(q, IOMMU_PAGE_SIZE(dev));
> 	blk_queue_max_segment_size(q, IOMMU_MAX_SEGMENT_SIZE(dev));

By the way, I reported an issue [1] and I'm thinking dima_is_direct() environment
(especially for swiotlb) is also needed such max_segment_size changes somehow.
What do you think?

[1]
https://marc.info/?l=linux-block&m=155954415603356&w=2

> 	return true;
> }
> 
> and the driver then does:
> 
> 	if (blk_can_use_iommu_merging(q, dev)) {
> 		blk_queue_max_segments(q, MAX_SW_SEGMENTS);
> 		// initialize sg mempool, etc..
> 	}

In this case, I assume that "the driver" is ./drivers/mmc/core/queue.c,
not any drivers/mmc/host/ code.

> Where the SCREAMING pseudo code calls are something we need to find a
> good API for.

I assumed
 - IOMMU_PAGE_SIZE(dev) = dma_get_seg_boundary(dev);
 - IOMMU_MAX_SEGMENT_SIZE(dev) = dma_get_max_seg_size(dev);

I could not find "IOMMU_PAGE_SIZE(dev))" for now.
If it's true, I'll add such a new API.

> And thinking about it the backend doesn't need to be an iommu, swiotlb
> could handle this as well, which might be interesting for devices
> that need to boune buffer anyway.  IIRC mmc actually has some code
> to copy multiple segments into a bounce buffer somewhere.

I see. So, as I mentioned above, this seems that swiotlb is also needed.
IIUC, now mmc doesn't have a bounce buffer from the commit [2].

[2]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/mmc/core?h=v5.2-rc3&id=de3ee99b097dd51938276e3af388cd4ad0f2750a

> > The block layer already has a limit "max_segment_size" for each device so that
> > regardless it can/cannot merge the segments, we can use the limit.
> > Is my understanding incorrect?
> 
> Yes.

Now I understand that block layer's max_segment_size differs with IOMMU's one.

Best regards,
Yoshihiro Shimoda


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH v5 3/8] iommu: add a new capable IOMMU_CAP_MERGING
  2019-06-07  5:41           ` Yoshihiro Shimoda
@ 2019-06-07  5:49             ` Christoph Hellwig
  2019-06-07  6:01               ` Yoshihiro Shimoda
  0 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2019-06-07  5:49 UTC (permalink / raw)
  To: Yoshihiro Shimoda
  Cc: Christoph Hellwig, Robin Murphy, ulf.hansson, wsa+renesas,
	m.szyprowski, joro, linux-mmc, iommu, linux-renesas-soc

On Fri, Jun 07, 2019 at 05:41:56AM +0000, Yoshihiro Shimoda wrote:
> > bool blk_can_use_iommu_merging(struct request_queue *q, struct device *dev)
> > {
> > 	if (!IOMMU_CAN_MERGE_SEGMENTS(dev))
> > 		return false;
> 
> As Robin mentioned, all IOMMUs can merge segments so that we don't need
> this condition, IIUC. However, this should check whether the device is mapped
> on iommu by using device_iommu_mapped().

There are plenty of dma_map_ops based drivers that can't merge segments.
Examples:

 - arch/ia64/sn/pci/pci_dma.c
 - arch/mips/jazz/jazzdma.c
 - arch/sparc/mm/io-unit.c
 - arch/sparc/mm/iommu.c
 - arch/x86/kernel/pci-calgary_64.c

Nevermind the diret mapping, swiotlb and other weirdos.

> 
> > 	blk_queue_virt_boundary(q, IOMMU_PAGE_SIZE(dev));
> > 	blk_queue_max_segment_size(q, IOMMU_MAX_SEGMENT_SIZE(dev));
> 
> By the way, I reported an issue [1] and I'm thinking dima_is_direct() environment
> (especially for swiotlb) is also needed such max_segment_size changes somehow.
> What do you think?
> 
> [1]
> https://marc.info/?l=linux-block&m=155954415603356&w=2

That doesn't seem to be related to the segment merging.  I'll take
a look, but next time please Cc the author of a suspect commit if
you already bisect things.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [RFC PATCH v5 3/8] iommu: add a new capable IOMMU_CAP_MERGING
  2019-06-07  5:49             ` Christoph Hellwig
@ 2019-06-07  6:01               ` Yoshihiro Shimoda
  0 siblings, 0 replies; 18+ messages in thread
From: Yoshihiro Shimoda @ 2019-06-07  6:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Robin Murphy, ulf.hansson, wsa+renesas, m.szyprowski, joro,
	linux-mmc, iommu, linux-renesas-soc

Hi Christoph,

> From: Christoph Hellwig, Sent: Friday, June 7, 2019 2:50 PM
> 
> On Fri, Jun 07, 2019 at 05:41:56AM +0000, Yoshihiro Shimoda wrote:
> > > bool blk_can_use_iommu_merging(struct request_queue *q, struct device *dev)
> > > {
> > > 	if (!IOMMU_CAN_MERGE_SEGMENTS(dev))
> > > 		return false;
> >
> > As Robin mentioned, all IOMMUs can merge segments so that we don't need
> > this condition, IIUC. However, this should check whether the device is mapped
> > on iommu by using device_iommu_mapped().
> 
> There are plenty of dma_map_ops based drivers that can't merge segments.
> Examples:
> 
>  - arch/ia64/sn/pci/pci_dma.c
>  - arch/mips/jazz/jazzdma.c
>  - arch/sparc/mm/io-unit.c
>  - arch/sparc/mm/iommu.c
>  - arch/x86/kernel/pci-calgary_64.c

Thank you for the indicate. I'll check these codes.

> Nevermind the diret mapping, swiotlb and other weirdos.

I got it.

> > > 	blk_queue_virt_boundary(q, IOMMU_PAGE_SIZE(dev));
> > > 	blk_queue_max_segment_size(q, IOMMU_MAX_SEGMENT_SIZE(dev));
> >
> > By the way, I reported an issue [1] and I'm thinking dima_is_direct() environment
> > (especially for swiotlb) is also needed such max_segment_size changes somehow.
> > What do you think?
> >
> > [1]
> > https://marc.info/?l=linux-block&m=155954415603356&w=2
> 
> That doesn't seem to be related to the segment merging.  I'll take
> a look, but next time please Cc the author of a suspect commit if
> you already bisect things.

Oops. I'll Cc the author in next time.

Best regards,
Yoshihiro Shimoda


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-06-07  6:01 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-05 11:11 [RFC PATCH v5 0/8] treewide: improve R-Car SDHI performance Yoshihiro Shimoda
2019-06-05 11:11 ` [RFC PATCH v5 1/8] dma-mapping: add a device driver helper for iova contiguous Yoshihiro Shimoda
2019-06-05 11:11 ` [RFC PATCH v5 2/8] iommu/dma: move iommu_dma_unmap_sg() place Yoshihiro Shimoda
2019-06-05 11:11 ` [RFC PATCH v5 3/8] iommu: add a new capable IOMMU_CAP_MERGING Yoshihiro Shimoda
2019-06-05 12:21   ` Robin Murphy
2019-06-05 12:38     ` Christoph Hellwig
2019-06-06  6:28       ` Yoshihiro Shimoda
2019-06-06  7:00         ` Christoph Hellwig
2019-06-07  5:41           ` Yoshihiro Shimoda
2019-06-07  5:49             ` Christoph Hellwig
2019-06-07  6:01               ` Yoshihiro Shimoda
2019-06-06  5:53     ` Yoshihiro Shimoda
2019-06-05 16:17   ` Sergei Shtylyov
2019-06-05 11:11 ` [RFC PATCH v5 4/8] iommu/ipmmu-vmsa: add capable ops Yoshihiro Shimoda
2019-06-05 11:11 ` [RFC PATCH v5 5/8] mmc: tmio: No memory size limitation if runs on IOMMU Yoshihiro Shimoda
2019-06-05 11:11 ` [RFC PATCH v5 6/8] mmc: tmio: Add a definition for default max_segs Yoshihiro Shimoda
2019-06-05 11:11 ` [RFC PATCH v5 7/8] mmc: renesas_sdhi: sort headers Yoshihiro Shimoda
2019-06-05 11:11 ` [RFC PATCH v5 8/8] mmc: renesas_sdhi: use multiple segments if possible Yoshihiro Shimoda

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).