iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Adding offset keeping option when mapping data via SWIOTLB.*
@ 2021-01-28  0:38 Jianxiong Gao via iommu
  2021-01-28  0:38 ` [PATCH 1/3] Adding page_offset_mask to device_dma_parameters Jianxiong Gao via iommu
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Jianxiong Gao via iommu @ 2021-01-28  0:38 UTC (permalink / raw)
  To: jxgao, erdemaktas, marcorr, hch, m.szyprowski, robin.murphy,
	gregkh, saravanak, heikki.krogerus, rafael.j.wysocki,
	andriy.shevchenko, dan.j.williams, bgolaszewski, jroedel, iommu,
	konrad.wilk, kbusch, axboe, sagi, linux-nvme, linux-kernel

NVMe driver and other applications may depend on the data offset
to operate correctly. Currently when unaligned data is mapped via
SWIOTLB, the data is mapped as slab aligned with the SWIOTLB. This
patch adds an option to make sure the mapped data preserves its
offset of the orginal addrss.

Without the patch when creating xfs formatted disk on NVMe backends,
with swiotlb=force in kernel boot option, creates the following error:
meta-data=/dev/nvme2n1           isize=512    agcount=4, agsize=131072 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0, rmapbt=0, refl
ink=0
data     =                       bsize=4096   blocks=524288, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
mkfs.xfs: pwrite failed: Input/output error

Jianxiong Gao (3):
  Adding page_offset_mask to device_dma_parameters
  Add swiotlb offset preserving mapping when
    dma_dma_parameters->page_offset_mask is non zero.
  Adding device_dma_parameters->offset_preserve_mask to NVMe driver.

 drivers/nvme/host/pci.c     |  4 ++++
 include/linux/device.h      |  1 +
 include/linux/dma-mapping.h | 17 +++++++++++++++++
 kernel/dma/swiotlb.c        | 16 +++++++++++++++-
 4 files changed, 37 insertions(+), 1 deletion(-)

-- 
2.27.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] Adding page_offset_mask to device_dma_parameters
  2021-01-28  0:38 [PATCH 0/3] Adding offset keeping option when mapping data via SWIOTLB.* Jianxiong Gao via iommu
@ 2021-01-28  0:38 ` Jianxiong Gao via iommu
  2021-01-28 17:27   ` Robin Murphy
  2021-01-28  0:38 ` [PATCH 2/3] Add swiotlb offset preserving mapping when dma_dma_parameters->page_offset_mask is non zero Jianxiong Gao via iommu
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Jianxiong Gao via iommu @ 2021-01-28  0:38 UTC (permalink / raw)
  To: jxgao, erdemaktas, marcorr, hch, m.szyprowski, robin.murphy,
	gregkh, saravanak, heikki.krogerus, rafael.j.wysocki,
	andriy.shevchenko, dan.j.williams, bgolaszewski, jroedel, iommu,
	konrad.wilk, kbusch, axboe, sagi, linux-nvme, linux-kernel

Some devices rely on the address offset in a page to function
correctly (NVMe driver as an example). These devices may use
a different page size than the Linux kernel. The address offset
has to be preserved upon mapping, and in order to do so, we
need to record the page_offset_mask first.

Signed-off-by: Jianxiong Gao <jxgao@google.com>
---
 include/linux/device.h      |  1 +
 include/linux/dma-mapping.h | 17 +++++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/include/linux/device.h b/include/linux/device.h
index 1779f90eeb4c..f44e0659fc66 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -292,6 +292,7 @@ struct device_dma_parameters {
 	 */
 	unsigned int max_segment_size;
 	unsigned long segment_boundary_mask;
+	unsigned int page_offset_mask;
 };
 
 /**
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 2e49996a8f39..5529a31fefba 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -500,6 +500,23 @@ static inline int dma_set_seg_boundary(struct device *dev, unsigned long mask)
 	return -EIO;
 }
 
+static inline unsigned int dma_get_page_offset_mask(struct device *dev)
+{
+	if (dev->dma_parms)
+		return dev->dma_parms->page_offset_mask;
+	return 0;
+}
+
+static inline int dma_set_page_offset_mask(struct device *dev,
+		unsigned int page_offset_mask)
+{
+	if (dev->dma_parms) {
+		dev->dma_parms->page_offset_mask = page_offset_mask;
+		return 0;
+	}
+	return -EIO;
+}
+
 static inline int dma_get_cache_alignment(void)
 {
 #ifdef ARCH_DMA_MINALIGN
-- 
2.27.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] Add swiotlb offset preserving mapping when dma_dma_parameters->page_offset_mask is non zero.
  2021-01-28  0:38 [PATCH 0/3] Adding offset keeping option when mapping data via SWIOTLB.* Jianxiong Gao via iommu
  2021-01-28  0:38 ` [PATCH 1/3] Adding page_offset_mask to device_dma_parameters Jianxiong Gao via iommu
@ 2021-01-28  0:38 ` Jianxiong Gao via iommu
  2021-01-28 17:15   ` Konrad Rzeszutek Wilk
  2021-01-28 18:16   ` Christoph Hellwig
  2021-01-28  0:38 ` [PATCH 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver Jianxiong Gao via iommu
  2021-01-28  8:04 ` [PATCH 0/3] Adding offset keeping option when mapping data via SWIOTLB.* Greg KH
  3 siblings, 2 replies; 14+ messages in thread
From: Jianxiong Gao via iommu @ 2021-01-28  0:38 UTC (permalink / raw)
  To: jxgao, erdemaktas, marcorr, hch, m.szyprowski, robin.murphy,
	gregkh, saravanak, heikki.krogerus, rafael.j.wysocki,
	andriy.shevchenko, dan.j.williams, bgolaszewski, jroedel, iommu,
	konrad.wilk, kbusch, axboe, sagi, linux-nvme, linux-kernel

For devices that need to preserve address offset on mapping through
swiotlb, this patch adds offset preserving based on page_offset_mask
and keeps the offset if the mask is non zero. This is needed for
device drivers like NVMe.

Signed-off-by: Jianxiong Gao <jxgao@google.com>
---
 kernel/dma/swiotlb.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 7c42df6e6100..4cab35f2c9bc 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -468,7 +468,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
 	dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start);
 	unsigned long flags;
 	phys_addr_t tlb_addr;
-	unsigned int nslots, stride, index, wrap;
+	unsigned int nslots, stride, index, wrap, page_offset_mask, page_offset;
 	int i;
 	unsigned long mask;
 	unsigned long offset_slots;
@@ -500,12 +500,16 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
 		    ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
 		    : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
 
+	page_offset_mask = dma_get_page_offset_mask(hwdev);
+	page_offset = orig_addr & page_offset_mask;
+	alloc_size += page_offset;
+
 	/*
 	 * For mappings greater than or equal to a page, we limit the stride
 	 * (and hence alignment) to a page size.
 	 */
 	nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
-	if (alloc_size >= PAGE_SIZE)
+	if ((alloc_size >= PAGE_SIZE) || (page_offset_mask > (1 << IO_TLB_SHIFT)))
 		stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT));
 	else
 		stride = 1;
@@ -583,6 +587,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
 	 */
 	for (i = 0; i < nslots; i++)
 		io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
+	/*
+	 * When keeping the offset of the original data, we need to advance
+	 * the tlb_addr by the offset of orig_addr.
+	 */
+	tlb_addr += page_offset;
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
 	    (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
 		swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);
@@ -598,7 +607,9 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
 			      enum dma_data_direction dir, unsigned long attrs)
 {
 	unsigned long flags;
-	int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+	unsigned int num_page_offset_slabs, page_offset_mask = dma_get_page_offset_mask(hwdev);
+	int i, count;
+	int nslots = ALIGN(alloc_size + tlb_addr & page_offset_mask, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
 	int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
 	phys_addr_t orig_addr = io_tlb_orig_addr[index];
 
@@ -610,6 +621,14 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
 	    ((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
 		swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_FROM_DEVICE);
 
+	/*
+	 * When dma_get_page_offset_mask is used, we may have padded more slabs
+	 * when padding exceeds one slab. We need to move index back to the
+	 * beginning of the padding.
+	 */
+	num_page_offset_slabs =  (tlb_addr & page_offset_mask) / (1 << IO_TLB_SHIFT);
+	index -= num_page_offset_slabs;
+
 	/*
 	 * Return the buffer to the free list by setting the corresponding
 	 * entries to indicate the number of contiguous entries available.
-- 
2.27.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.
  2021-01-28  0:38 [PATCH 0/3] Adding offset keeping option when mapping data via SWIOTLB.* Jianxiong Gao via iommu
  2021-01-28  0:38 ` [PATCH 1/3] Adding page_offset_mask to device_dma_parameters Jianxiong Gao via iommu
  2021-01-28  0:38 ` [PATCH 2/3] Add swiotlb offset preserving mapping when dma_dma_parameters->page_offset_mask is non zero Jianxiong Gao via iommu
@ 2021-01-28  0:38 ` Jianxiong Gao via iommu
  2021-01-28 18:00   ` Robin Murphy
  2021-01-28  8:04 ` [PATCH 0/3] Adding offset keeping option when mapping data via SWIOTLB.* Greg KH
  3 siblings, 1 reply; 14+ messages in thread
From: Jianxiong Gao via iommu @ 2021-01-28  0:38 UTC (permalink / raw)
  To: jxgao, erdemaktas, marcorr, hch, m.szyprowski, robin.murphy,
	gregkh, saravanak, heikki.krogerus, rafael.j.wysocki,
	andriy.shevchenko, dan.j.williams, bgolaszewski, jroedel, iommu,
	konrad.wilk, kbusch, axboe, sagi, linux-nvme, linux-kernel

NVMe driver relies on the address offset to function properly.
This patch adds the offset preserve mask to NVMe driver when mapping
via dma_map_sg_attrs and unmapping via nvme_unmap_sg. The mask
depends on the page size defined by CC.MPS register of NVMe
controller.

Signed-off-by: Jianxiong Gao <jxgao@google.com>
---
 drivers/nvme/host/pci.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 856aa31931c1..0b23f04068be 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -580,12 +580,15 @@ static void nvme_free_sgls(struct nvme_dev *dev, struct request *req)
 static void nvme_unmap_sg(struct nvme_dev *dev, struct request *req)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-
+	if (dma_set_page_offset_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1))
+		dev_warn(dev->dev, "dma_set_page_offset_mask failed to set offset\n");
 	if (is_pci_p2pdma_page(sg_page(iod->sg)))
 		pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents,
 				    rq_dma_dir(req));
 	else
 		dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
+	if (dma_set_page_offset_mask(dev->dev, 0))
+		dev_warn(dev->dev, "dma_set_page_offset_mask failed to reset offset\n");
 }
 
 static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
@@ -842,7 +845,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
 	blk_status_t ret = BLK_STS_RESOURCE;
-	int nr_mapped;
+	int nr_mapped, offset_ret;
 
 	if (blk_rq_nr_phys_segments(req) == 1) {
 		struct bio_vec bv = req_bvec(req);
@@ -868,12 +871,24 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 	if (!iod->nents)
 		goto out_free_sg;
 
+	offset_ret = dma_set_page_offset_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1);
+	if (offset_ret) {
+		dev_warn(dev->dev, "dma_set_page_offset_mask failed to set offset\n");
+		goto out_free_sg;
+	}
+
 	if (is_pci_p2pdma_page(sg_page(iod->sg)))
 		nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
 				iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN);
 	else
 		nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
 					     rq_dma_dir(req), DMA_ATTR_NO_WARN);
+
+	offset_ret = dma_set_page_offset_mask(dev->dev, 0);
+	if (offset_ret) {
+		dev_warn(dev->dev, "dma_set_page_offset_mask failed to reset offset\n");
+		goto out_free_sg;
+	}
 	if (!nr_mapped)
 		goto out_free_sg;
 
-- 
2.27.0

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/3] Adding offset keeping option when mapping data via SWIOTLB.*
  2021-01-28  0:38 [PATCH 0/3] Adding offset keeping option when mapping data via SWIOTLB.* Jianxiong Gao via iommu
                   ` (2 preceding siblings ...)
  2021-01-28  0:38 ` [PATCH 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver Jianxiong Gao via iommu
@ 2021-01-28  8:04 ` Greg KH
  3 siblings, 0 replies; 14+ messages in thread
From: Greg KH @ 2021-01-28  8:04 UTC (permalink / raw)
  To: Jianxiong Gao
  Cc: axboe, heikki.krogerus, sagi, saravanak, konrad.wilk, marcorr,
	rafael.j.wysocki, linux-kernel, linux-nvme, kbusch, bgolaszewski,
	iommu, jroedel, dan.j.williams, andriy.shevchenko, robin.murphy,
	hch

On Wed, Jan 27, 2021 at 04:38:26PM -0800, Jianxiong Gao wrote:
> NVMe driver and other applications may depend on the data offset
> to operate correctly. Currently when unaligned data is mapped via
> SWIOTLB, the data is mapped as slab aligned with the SWIOTLB. This
> patch adds an option to make sure the mapped data preserves its
> offset of the orginal addrss.
> 
> Without the patch when creating xfs formatted disk on NVMe backends,
> with swiotlb=force in kernel boot option, creates the following error:
> meta-data=/dev/nvme2n1           isize=512    agcount=4, agsize=131072 blks
>          =                       sectsz=512   attr=2, projid32bit=1
>          =                       crc=1        finobt=1, sparse=0, rmapbt=0, refl
> ink=0
> data     =                       bsize=4096   blocks=524288, imaxpct=25
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> log      =internal log           bsize=4096   blocks=2560, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> mkfs.xfs: pwrite failed: Input/output error
> 
> Jianxiong Gao (3):
>   Adding page_offset_mask to device_dma_parameters
>   Add swiotlb offset preserving mapping when
>     dma_dma_parameters->page_offset_mask is non zero.
>   Adding device_dma_parameters->offset_preserve_mask to NVMe driver.


Hi,

This is the friendly patch-bot of Greg Kroah-Hartman.  You have sent him
a patch that has triggered this response.  He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created.  Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- You did not write a descriptive Subject: for the patch, allowing Greg,
  and everyone else, to know what this patch is all about.  Please read
  the section entitled "The canonical patch format" in the kernel file,
  Documentation/SubmittingPatches for what a proper Subject: line should
  look like.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] Add swiotlb offset preserving mapping when dma_dma_parameters->page_offset_mask is non zero.
  2021-01-28  0:38 ` [PATCH 2/3] Add swiotlb offset preserving mapping when dma_dma_parameters->page_offset_mask is non zero Jianxiong Gao via iommu
@ 2021-01-28 17:15   ` Konrad Rzeszutek Wilk
  2021-01-28 17:34     ` Keith Busch
  2021-01-28 18:16   ` Christoph Hellwig
  1 sibling, 1 reply; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2021-01-28 17:15 UTC (permalink / raw)
  To: Jianxiong Gao
  Cc: axboe, heikki.krogerus, sagi, saravanak, marcorr, gregkh,
	rafael.j.wysocki, linux-kernel, linux-nvme, kbusch, bgolaszewski,
	iommu, jroedel, dan.j.williams, andriy.shevchenko, robin.murphy,
	hch

On Wed, Jan 27, 2021 at 04:38:28PM -0800, Jianxiong Gao wrote:
> For devices that need to preserve address offset on mapping through
> swiotlb, this patch adds offset preserving based on page_offset_mask
> and keeps the offset if the mask is non zero. This is needed for
> device drivers like NVMe.

<scratches his head>

Didn't you send this patch like a month ago and someone pointed
out that the right fix would be in the NVMe driver?

Is there an issue with fixing the NVMe driver?

> 
> Signed-off-by: Jianxiong Gao <jxgao@google.com>
> ---
>  kernel/dma/swiotlb.c | 25 ++++++++++++++++++++++---
>  1 file changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 7c42df6e6100..4cab35f2c9bc 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -468,7 +468,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
>  	dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start);
>  	unsigned long flags;
>  	phys_addr_t tlb_addr;
> -	unsigned int nslots, stride, index, wrap;
> +	unsigned int nslots, stride, index, wrap, page_offset_mask, page_offset;
>  	int i;
>  	unsigned long mask;
>  	unsigned long offset_slots;
> @@ -500,12 +500,16 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
>  		    ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
>  		    : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
>  
> +	page_offset_mask = dma_get_page_offset_mask(hwdev);
> +	page_offset = orig_addr & page_offset_mask;
> +	alloc_size += page_offset;
> +
>  	/*
>  	 * For mappings greater than or equal to a page, we limit the stride
>  	 * (and hence alignment) to a page size.
>  	 */
>  	nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
> -	if (alloc_size >= PAGE_SIZE)
> +	if ((alloc_size >= PAGE_SIZE) || (page_offset_mask > (1 << IO_TLB_SHIFT)))
>  		stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT));
>  	else
>  		stride = 1;
> @@ -583,6 +587,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
>  	 */
>  	for (i = 0; i < nslots; i++)
>  		io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
> +	/*
> +	 * When keeping the offset of the original data, we need to advance
> +	 * the tlb_addr by the offset of orig_addr.
> +	 */
> +	tlb_addr += page_offset;
>  	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
>  	    (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
>  		swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);
> @@ -598,7 +607,9 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
>  			      enum dma_data_direction dir, unsigned long attrs)
>  {
>  	unsigned long flags;
> -	int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
> +	unsigned int num_page_offset_slabs, page_offset_mask = dma_get_page_offset_mask(hwdev);
> +	int i, count;
> +	int nslots = ALIGN(alloc_size + tlb_addr & page_offset_mask, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
>  	int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
>  	phys_addr_t orig_addr = io_tlb_orig_addr[index];
>  
> @@ -610,6 +621,14 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
>  	    ((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
>  		swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_FROM_DEVICE);
>  
> +	/*
> +	 * When dma_get_page_offset_mask is used, we may have padded more slabs
> +	 * when padding exceeds one slab. We need to move index back to the
> +	 * beginning of the padding.
> +	 */
> +	num_page_offset_slabs =  (tlb_addr & page_offset_mask) / (1 << IO_TLB_SHIFT);
> +	index -= num_page_offset_slabs;
> +
>  	/*
>  	 * Return the buffer to the free list by setting the corresponding
>  	 * entries to indicate the number of contiguous entries available.
> -- 
> 2.27.0
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] Adding page_offset_mask to device_dma_parameters
  2021-01-28  0:38 ` [PATCH 1/3] Adding page_offset_mask to device_dma_parameters Jianxiong Gao via iommu
@ 2021-01-28 17:27   ` Robin Murphy
  2021-01-28 18:15     ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Robin Murphy @ 2021-01-28 17:27 UTC (permalink / raw)
  To: Jianxiong Gao, erdemaktas, marcorr, hch, m.szyprowski, gregkh,
	saravanak, heikki.krogerus, rafael.j.wysocki, andriy.shevchenko,
	dan.j.williams, bgolaszewski, jroedel, iommu, konrad.wilk,
	kbusch, axboe, sagi, linux-nvme, linux-kernel

On 2021-01-28 00:38, Jianxiong Gao wrote:
> Some devices rely on the address offset in a page to function
> correctly (NVMe driver as an example). These devices may use
> a different page size than the Linux kernel. The address offset
> has to be preserved upon mapping, and in order to do so, we
> need to record the page_offset_mask first.
> 
> Signed-off-by: Jianxiong Gao <jxgao@google.com>
> ---
>   include/linux/device.h      |  1 +
>   include/linux/dma-mapping.h | 17 +++++++++++++++++
>   2 files changed, 18 insertions(+)
> 
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 1779f90eeb4c..f44e0659fc66 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -292,6 +292,7 @@ struct device_dma_parameters {
>   	 */
>   	unsigned int max_segment_size;
>   	unsigned long segment_boundary_mask;
> +	unsigned int page_offset_mask;

Could we call this something more like "min_align_mask" (sorry, I can't 
think of a name that's actually good and descriptive right now). 
Essentially I worry that having "page" in there is going to be too easy 
to misinterpret as having anything to do what "page" means almost 
everywhere else (even before you throw IOMMU pages into the mix).

Also note that of all the possible ways to pack two ints and a long, 
this one is the worst ;)

Robin.

>   };
>   
>   /**
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 2e49996a8f39..5529a31fefba 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -500,6 +500,23 @@ static inline int dma_set_seg_boundary(struct device *dev, unsigned long mask)
>   	return -EIO;
>   }
>   
> +static inline unsigned int dma_get_page_offset_mask(struct device *dev)
> +{
> +	if (dev->dma_parms)
> +		return dev->dma_parms->page_offset_mask;
> +	return 0;
> +}
> +
> +static inline int dma_set_page_offset_mask(struct device *dev,
> +		unsigned int page_offset_mask)
> +{
> +	if (dev->dma_parms) {
> +		dev->dma_parms->page_offset_mask = page_offset_mask;
> +		return 0;
> +	}
> +	return -EIO;
> +}
> +
>   static inline int dma_get_cache_alignment(void)
>   {
>   #ifdef ARCH_DMA_MINALIGN
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] Add swiotlb offset preserving mapping when dma_dma_parameters->page_offset_mask is non zero.
  2021-01-28 17:15   ` Konrad Rzeszutek Wilk
@ 2021-01-28 17:34     ` Keith Busch
  2021-01-28 18:06       ` Jianxiong Gao via iommu
  0 siblings, 1 reply; 14+ messages in thread
From: Keith Busch @ 2021-01-28 17:34 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: heikki.krogerus, sagi, saravanak, marcorr, gregkh,
	rafael.j.wysocki, linux-kernel, andriy.shevchenko, axboe,
	bgolaszewski, iommu, jroedel, linux-nvme, dan.j.williams,
	Jianxiong Gao, robin.murphy, hch

On Thu, Jan 28, 2021 at 12:15:28PM -0500, Konrad Rzeszutek Wilk wrote:
> On Wed, Jan 27, 2021 at 04:38:28PM -0800, Jianxiong Gao wrote:
> > For devices that need to preserve address offset on mapping through
> > swiotlb, this patch adds offset preserving based on page_offset_mask
> > and keeps the offset if the mask is non zero. This is needed for
> > device drivers like NVMe.
> 
> <scratches his head>
> 
> Didn't you send this patch like a month ago and someone pointed
> out that the right fix would be in the NVMe driver?
> 
> Is there an issue with fixing the NVMe driver?

You got it backwards. The initial "fix" used a flag specific to the nvme
driver, and it was pointed out that it should just be the generic
behaviour.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.
  2021-01-28  0:38 ` [PATCH 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver Jianxiong Gao via iommu
@ 2021-01-28 18:00   ` Robin Murphy
  2021-01-28 18:18     ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Robin Murphy @ 2021-01-28 18:00 UTC (permalink / raw)
  To: Jianxiong Gao, erdemaktas, marcorr, hch, m.szyprowski, gregkh,
	saravanak, heikki.krogerus, rafael.j.wysocki, andriy.shevchenko,
	dan.j.williams, bgolaszewski, jroedel, iommu, konrad.wilk,
	kbusch, axboe, sagi, linux-nvme, linux-kernel

On 2021-01-28 00:38, Jianxiong Gao wrote:
> NVMe driver relies on the address offset to function properly.
> This patch adds the offset preserve mask to NVMe driver when mapping
> via dma_map_sg_attrs and unmapping via nvme_unmap_sg. The mask
> depends on the page size defined by CC.MPS register of NVMe
> controller.
> 
> Signed-off-by: Jianxiong Gao <jxgao@google.com>
> ---
>   drivers/nvme/host/pci.c | 19 +++++++++++++++++--
>   1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 856aa31931c1..0b23f04068be 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -580,12 +580,15 @@ static void nvme_free_sgls(struct nvme_dev *dev, struct request *req)
>   static void nvme_unmap_sg(struct nvme_dev *dev, struct request *req)
>   {
>   	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> -
> +	if (dma_set_page_offset_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1))
> +		dev_warn(dev->dev, "dma_set_page_offset_mask failed to set offset\n");
>   	if (is_pci_p2pdma_page(sg_page(iod->sg)))
>   		pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents,
>   				    rq_dma_dir(req));
>   	else
>   		dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
> +	if (dma_set_page_offset_mask(dev->dev, 0))
> +		dev_warn(dev->dev, "dma_set_page_offset_mask failed to reset offset\n");
>   }
>   
>   static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
> @@ -842,7 +845,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>   {
>   	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
>   	blk_status_t ret = BLK_STS_RESOURCE;
> -	int nr_mapped;
> +	int nr_mapped, offset_ret;
>   
>   	if (blk_rq_nr_phys_segments(req) == 1) {
>   		struct bio_vec bv = req_bvec(req);
> @@ -868,12 +871,24 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>   	if (!iod->nents)
>   		goto out_free_sg;
>   
> +	offset_ret = dma_set_page_offset_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1);
> +	if (offset_ret) {
> +		dev_warn(dev->dev, "dma_set_page_offset_mask failed to set offset\n");
> +		goto out_free_sg;
> +	}
> +
>   	if (is_pci_p2pdma_page(sg_page(iod->sg)))
>   		nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
>   				iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN);
>   	else
>   		nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
>   					     rq_dma_dir(req), DMA_ATTR_NO_WARN);
> +
> +	offset_ret = dma_set_page_offset_mask(dev->dev, 0);
> +	if (offset_ret) {
> +		dev_warn(dev->dev, "dma_set_page_offset_mask failed to reset offset\n");
> +		goto out_free_sg;

If it were possible for this to fail, you might leak the DMA mapping 
here. However if dev->dma_parms somehow disappeared since a dozen lines 
above then I think you've got far bigger problems anyway.

That said, do you really need to keep toggling this back and forth all 
the time? Even if the device does make other mappings elsewhere that 
don't necessarily need the same strict alignment, would it be 
significantly harmful just to set it once at probe and leave it in place 
anyway?

Robin.

> +	}
>   	if (!nr_mapped)
>   		goto out_free_sg;
>   
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] Add swiotlb offset preserving mapping when dma_dma_parameters->page_offset_mask is non zero.
  2021-01-28 17:34     ` Keith Busch
@ 2021-01-28 18:06       ` Jianxiong Gao via iommu
  0 siblings, 0 replies; 14+ messages in thread
From: Jianxiong Gao via iommu @ 2021-01-28 18:06 UTC (permalink / raw)
  To: Keith Busch
  Cc: heikki.krogerus, sagi, Saravana Kannan, Konrad Rzeszutek Wilk,
	Marc Orr, gregkh, rafael.j.wysocki, linux-kernel, linux-nvme,
	axboe, bgolaszewski, iommu, jroedel, dan.j.williams,
	andriy.shevchenko, robin.murphy, hch


[-- Attachment #1.1: Type: text/plain, Size: 1146 bytes --]

The error can't be fixed by just updating the NVMe driver.
The NVMe spec (and as pointed out by Chirstoph, some other drivers) rely on
the offset of address to copy data correctly. When data is mapped via
swiotlb,
the current implementation always copy the data at 2k/4k aligned address.


On Thu, Jan 28, 2021 at 9:35 AM Keith Busch <kbusch@kernel.org> wrote:

> On Thu, Jan 28, 2021 at 12:15:28PM -0500, Konrad Rzeszutek Wilk wrote:
> > On Wed, Jan 27, 2021 at 04:38:28PM -0800, Jianxiong Gao wrote:
> > > For devices that need to preserve address offset on mapping through
> > > swiotlb, this patch adds offset preserving based on page_offset_mask
> > > and keeps the offset if the mask is non zero. This is needed for
> > > device drivers like NVMe.
> >
> > <scratches his head>
> >
> > Didn't you send this patch like a month ago and someone pointed
> > out that the right fix would be in the NVMe driver?
> >
> > Is there an issue with fixing the NVMe driver?
>
> You got it backwards. The initial "fix" used a flag specific to the nvme
> driver, and it was pointed out that it should just be the generic
> behaviour.
>


-- 
Jianxiong Gao

[-- Attachment #1.2: Type: text/html, Size: 1693 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] Adding page_offset_mask to device_dma_parameters
  2021-01-28 17:27   ` Robin Murphy
@ 2021-01-28 18:15     ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2021-01-28 18:15 UTC (permalink / raw)
  To: Robin Murphy
  Cc: axboe, heikki.krogerus, sagi, saravanak, konrad.wilk, marcorr,
	gregkh, rafael.j.wysocki, linux-kernel, andriy.shevchenko,
	kbusch, bgolaszewski, iommu, jroedel, linux-nvme, dan.j.williams,
	Jianxiong Gao, hch

On Thu, Jan 28, 2021 at 05:27:25PM +0000, Robin Murphy wrote:
> On 2021-01-28 00:38, Jianxiong Gao wrote:
>> Some devices rely on the address offset in a page to function
>> correctly (NVMe driver as an example). These devices may use
>> a different page size than the Linux kernel. The address offset
>> has to be preserved upon mapping, and in order to do so, we
>> need to record the page_offset_mask first.
>>
>> Signed-off-by: Jianxiong Gao <jxgao@google.com>
>> ---
>>   include/linux/device.h      |  1 +
>>   include/linux/dma-mapping.h | 17 +++++++++++++++++
>>   2 files changed, 18 insertions(+)
>>
>> diff --git a/include/linux/device.h b/include/linux/device.h
>> index 1779f90eeb4c..f44e0659fc66 100644
>> --- a/include/linux/device.h
>> +++ b/include/linux/device.h
>> @@ -292,6 +292,7 @@ struct device_dma_parameters {
>>   	 */
>>   	unsigned int max_segment_size;
>>   	unsigned long segment_boundary_mask;
>> +	unsigned int page_offset_mask;
>
> Could we call this something more like "min_align_mask" (sorry, I can't 
> think of a name that's actually good and descriptive right now). 
> Essentially I worry that having "page" in there is going to be too easy to 
> misinterpret as having anything to do what "page" means almost everywhere 
> else (even before you throw IOMMU pages into the mix).
>
> Also note that of all the possible ways to pack two ints and a long, this 
> one is the worst ;)

The block layer uses virt_boundary for the related concept, but that
is pretty horrible too.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] Add swiotlb offset preserving mapping when dma_dma_parameters->page_offset_mask is non zero.
  2021-01-28  0:38 ` [PATCH 2/3] Add swiotlb offset preserving mapping when dma_dma_parameters->page_offset_mask is non zero Jianxiong Gao via iommu
  2021-01-28 17:15   ` Konrad Rzeszutek Wilk
@ 2021-01-28 18:16   ` Christoph Hellwig
  1 sibling, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2021-01-28 18:16 UTC (permalink / raw)
  To: Jianxiong Gao
  Cc: axboe, heikki.krogerus, sagi, saravanak, konrad.wilk, marcorr,
	gregkh, rafael.j.wysocki, linux-kernel, linux-nvme, kbusch,
	bgolaszewski, iommu, jroedel, dan.j.williams, andriy.shevchenko,
	robin.murphy, hch

On Wed, Jan 27, 2021 at 04:38:28PM -0800, Jianxiong Gao wrote:
> For devices that need to preserve address offset on mapping through
> swiotlb, this patch adds offset preserving based on page_offset_mask
> and keeps the offset if the mask is non zero. This is needed for
> device drivers like NVMe.
> 
> Signed-off-by: Jianxiong Gao <jxgao@google.com>
> ---
>  kernel/dma/swiotlb.c | 25 ++++++++++++++++++++++---
>  1 file changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 7c42df6e6100..4cab35f2c9bc 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -468,7 +468,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
>  	dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start);
>  	unsigned long flags;
>  	phys_addr_t tlb_addr;
> -	unsigned int nslots, stride, index, wrap;
> +	unsigned int nslots, stride, index, wrap, page_offset_mask, page_offset;
>  	int i;
>  	unsigned long mask;
>  	unsigned long offset_slots;
> @@ -500,12 +500,16 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
>  		    ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
>  		    : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
>  
> +	page_offset_mask = dma_get_page_offset_mask(hwdev);
> +	page_offset = orig_addr & page_offset_mask;
> +	alloc_size += page_offset;
> +
>  	/*
>  	 * For mappings greater than or equal to a page, we limit the stride
>  	 * (and hence alignment) to a page size.
>  	 */
>  	nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
> -	if (alloc_size >= PAGE_SIZE)
> +	if ((alloc_size >= PAGE_SIZE) || (page_offset_mask > (1 << IO_TLB_SHIFT)))
>  		stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT));
>  	else
>  		stride = 1;
> @@ -583,6 +587,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
>  	 */
>  	for (i = 0; i < nslots; i++)
>  		io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
> +	/*
> +	 * When keeping the offset of the original data, we need to advance
> +	 * the tlb_addr by the offset of orig_addr.
> +	 */
> +	tlb_addr += page_offset;
>  	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
>  	    (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
>  		swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);
> @@ -598,7 +607,9 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
>  			      enum dma_data_direction dir, unsigned long attrs)
>  {
>  	unsigned long flags;
> -	int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
> +	unsigned int num_page_offset_slabs, page_offset_mask = dma_get_page_offset_mask(hwdev);

Yikes, please avoid these crazy long lines.

> +	num_page_offset_slabs =  (tlb_addr & page_offset_mask) / (1 << IO_TLB_SHIFT);

also a double whitespace here.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.
  2021-01-28 18:00   ` Robin Murphy
@ 2021-01-28 18:18     ` Christoph Hellwig
  2021-01-28 18:28       ` Jianxiong Gao via iommu
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2021-01-28 18:18 UTC (permalink / raw)
  To: Robin Murphy
  Cc: axboe, heikki.krogerus, sagi, saravanak, konrad.wilk, marcorr,
	gregkh, rafael.j.wysocki, linux-kernel, andriy.shevchenko,
	kbusch, bgolaszewski, iommu, jroedel, linux-nvme, dan.j.williams,
	Jianxiong Gao, hch

On Thu, Jan 28, 2021 at 06:00:58PM +0000, Robin Murphy wrote:
> If it were possible for this to fail, you might leak the DMA mapping here. 
> However if dev->dma_parms somehow disappeared since a dozen lines above 
> then I think you've got far bigger problems anyway.
>
> That said, do you really need to keep toggling this back and forth all the 
> time? Even if the device does make other mappings elsewhere that don't 
> necessarily need the same strict alignment, would it be significantly 
> harmful just to set it once at probe and leave it in place anyway?

Yes, we should kept it set all the time.  While some NVMe devices have
the optional to use SGLs that do not have this limitation, I have
absolutely no sympathy for anyone running NVMe with swiotlb as that
means their system imposes an addressing limitation.  We need to make
sure it does not corrupt data, but we're not going to make any effort
to optimize for such a degenerated setup.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.
  2021-01-28 18:18     ` Christoph Hellwig
@ 2021-01-28 18:28       ` Jianxiong Gao via iommu
  0 siblings, 0 replies; 14+ messages in thread
From: Jianxiong Gao via iommu @ 2021-01-28 18:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, heikki.krogerus, sagi, Saravana Kannan,
	Konrad Rzeszutek Wilk, Marc Orr, gregkh, rafael.j.wysocki,
	linux-kernel, linux-nvme, Keith Busch, bgolaszewski, iommu,
	jroedel, dan.j.williams, andriy.shevchenko, Robin Murphy


[-- Attachment #1.1: Type: text/plain, Size: 4023 bytes --]

I have tried to set it once at probe and then leave it in place, however
the NVMe driver does not seem to like it, and the VM does not boot
correctly. I have spent a couple days debugging but I am a bit lost
now.

Basically whenever nvme_setup_prp_simple
<https://elixir.bootlin.com/linux/v5.11-rc5/source/drivers/nvme/host/pci.c#L803>
is
mapping with the mask,
I am getting timeout issues on boot, which to my knowledge shows NVMe
driver failure:

> [    5.500662] random: crng init done
> [    5.502933] random: 7 urandom warning(s) missed due to ratelimiting
> [  132.077795] dracut-initqueue[472]: Warning: dracut-initqueue timeout -
> starting timeout scripts
> [  132.614755] dracut-initqueue[472]: Warning: dracut-initqueue timeout -
> starting timeout scripts
>

I have checked that all the mappings happened correctly:

> [    4.773570] nvme 0000:00:04.0: nvme setup prp simple is mapping 200
> data, with offset 200, from fffffc9acd6c6040, mapped at 7affb200
> [    4.784540] nvme 0000:00:04.0: nvme setup prp simple is mapping 200
> data, with offset 400, from fffffc9acd6c6040, mapped at 7affc400
> [    4.794096] nvme 0000:00:04.0: nvme setup prp simple is mapping 200
> data, with offset 600, from fffffc9acd6c6040, mapped at 7affd600
> [    4.801983] nvme 0000:00:04.0: nvme setup prp simple is mapping 200
> data, with offset 800, from fffffc9acd6c6040, mapped at 7affe800
> [    4.806873] nvme 0000:00:04.0: nvme setup prp simple is mapping 200
> data, with offset a00, from fffffc9acd6c6040, mapped at 7afffa00
> [    4.812382] nvme 0000:00:04.0: nvme setup prp simple is mapping 200
> data, with offset c00, from fffffc9acd6c6040, mapped at 7b000c00
> [    4.817423] nvme 0000:00:04.0: nvme setup prp simple is mapping 200
> data, with offset e00, from fffffc9acd6c6040, mapped at 7b001e00
> [    4.823652] nvme 0000:00:04.0: nvme setup prp simple is mapping 200
> data, with offset 200, from fffffc9acd6c60c0, mapped at 7b003200
> [    4.828679] nvme 0000:00:04.0: nvme setup prp simple is mapping 200
> data, with offset 400, from fffffc9acd6c60c0, mapped at 7b004400
> [    4.833875] nvme 0000:00:04.0: nvme setup prp simple is mapping 200
> data, with offset 600, from fffffc9acd6c60c0, mapped at 7b005600
> [    4.838926] nvme 0000:00:04.0: nvme setup prp simple is mapping 200
> data, with offset 800, from fffffc9acd6c60c0, mapped at 7b006800

...

I have compared it to not setting the mask. The only difference in result is
that instead of being mapped to *200|* 400|*600 etc, they are all mapped
to *000. So I believe the mapping is done correctly, and according to NVMe
<https://nvmexpress.org/wp-content/uploads/NVM-Express-1_4b-2020.09.21-Ratified.pdf>
spec
<https://nvmexpress.org/wp-content/uploads/NVM-Express-1_4b-2020.09.21-Ratified.pdf>
figure
108/109, the mapping should have the offset kept. I am not
sure what caused the error that eventually led to the failure. Is there
another
bug in the NVMe driver?

On Thu, Jan 28, 2021 at 10:18 AM Christoph Hellwig <hch@lst.de> wrote:

> On Thu, Jan 28, 2021 at 06:00:58PM +0000, Robin Murphy wrote:
> > If it were possible for this to fail, you might leak the DMA mapping
> here.
> > However if dev->dma_parms somehow disappeared since a dozen lines above
> > then I think you've got far bigger problems anyway.
> >
> > That said, do you really need to keep toggling this back and forth all
> the
> > time? Even if the device does make other mappings elsewhere that don't
> > necessarily need the same strict alignment, would it be significantly
> > harmful just to set it once at probe and leave it in place anyway?
>
> Yes, we should kept it set all the time.  While some NVMe devices have
> the optional to use SGLs that do not have this limitation, I have
> absolutely no sympathy for anyone running NVMe with swiotlb as that
> means their system imposes an addressing limitation.  We need to make
> sure it does not corrupt data, but we're not going to make any effort
> to optimize for such a degenerated setup.
>


-- 
Jianxiong Gao

[-- Attachment #1.2: Type: text/html, Size: 5139 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-01-28 18:35 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-28  0:38 [PATCH 0/3] Adding offset keeping option when mapping data via SWIOTLB.* Jianxiong Gao via iommu
2021-01-28  0:38 ` [PATCH 1/3] Adding page_offset_mask to device_dma_parameters Jianxiong Gao via iommu
2021-01-28 17:27   ` Robin Murphy
2021-01-28 18:15     ` Christoph Hellwig
2021-01-28  0:38 ` [PATCH 2/3] Add swiotlb offset preserving mapping when dma_dma_parameters->page_offset_mask is non zero Jianxiong Gao via iommu
2021-01-28 17:15   ` Konrad Rzeszutek Wilk
2021-01-28 17:34     ` Keith Busch
2021-01-28 18:06       ` Jianxiong Gao via iommu
2021-01-28 18:16   ` Christoph Hellwig
2021-01-28  0:38 ` [PATCH 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver Jianxiong Gao via iommu
2021-01-28 18:00   ` Robin Murphy
2021-01-28 18:18     ` Christoph Hellwig
2021-01-28 18:28       ` Jianxiong Gao via iommu
2021-01-28  8:04 ` [PATCH 0/3] Adding offset keeping option when mapping data via SWIOTLB.* Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).