* [PATCH V2 1/3] Adding page_offset_mask to device_dma_parameters
2021-02-01 18:30 [PATCH V2 0/3] SWIOTLB: Preserve swiotlb map offset when needed Jianxiong Gao
@ 2021-02-01 18:30 ` Jianxiong Gao
2021-02-01 18:30 ` [PATCH V2 2/3] Add swiotlb offset preserving mapping when dma_dma_parameters->page_offset_mask is non zero Jianxiong Gao
` (2 subsequent siblings)
3 siblings, 0 replies; 15+ messages in thread
From: Jianxiong Gao @ 2021-02-01 18:30 UTC (permalink / raw)
To: jxgao, erdemaktas, marcorr, hch, m.szyprowski, robin.murphy,
gregkh, saravanak, heikki.krogerus, rafael.j.wysocki,
andriy.shevchenko, dan.j.williams, bgolaszewski, jroedel, iommu,
konrad.wilk, kbusch, axboe, sagi, linux-nvme, linux-kernel
Some devices rely on the address offset in a page to function
correctly (NVMe driver as an example). These devices may use
a different page size than the Linux kernel. The address offset
has to be preserved upon mapping, and in order to do so, we
need to record the page_offset_mask first.
Signed-off-by: Jianxiong Gao <jxgao@google.com>
---
include/linux/device.h | 1 +
include/linux/dma-mapping.h | 17 +++++++++++++++++
2 files changed, 18 insertions(+)
diff --git a/include/linux/device.h b/include/linux/device.h
index 1779f90eeb4c..7960bf516dd7 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -291,6 +291,7 @@ struct device_dma_parameters {
* sg limitations.
*/
unsigned int max_segment_size;
+ unsigned int min_align_mask;
unsigned long segment_boundary_mask;
};
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 2e49996a8f39..27ec3cab8cbd 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -500,6 +500,23 @@ static inline int dma_set_seg_boundary(struct device *dev, unsigned long mask)
return -EIO;
}
+static inline unsigned int dma_get_min_align_mask(struct device *dev)
+{
+ if (dev->dma_parms)
+ return dev->dma_parms->min_align_mask;
+ return 0;
+}
+
+static inline int dma_set_min_align_mask(struct device *dev,
+ unsigned int min_align_mask)
+{
+ if (dev->dma_parms) {
+ dev->dma_parms->min_align_mask = min_align_mask;
+ return 0;
+ }
+ return -EIO;
+}
+
static inline int dma_get_cache_alignment(void)
{
#ifdef ARCH_DMA_MINALIGN
--
2.27.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH V2 2/3] Add swiotlb offset preserving mapping when dma_dma_parameters->page_offset_mask is non zero.
2021-02-01 18:30 [PATCH V2 0/3] SWIOTLB: Preserve swiotlb map offset when needed Jianxiong Gao
2021-02-01 18:30 ` [PATCH V2 1/3] Adding page_offset_mask to device_dma_parameters Jianxiong Gao
@ 2021-02-01 18:30 ` Jianxiong Gao
2021-02-01 18:30 ` [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver Jianxiong Gao
2021-02-02 11:53 ` [PATCH V2 0/3] SWIOTLB: Preserve swiotlb map offset when needed Greg KH
3 siblings, 0 replies; 15+ messages in thread
From: Jianxiong Gao @ 2021-02-01 18:30 UTC (permalink / raw)
To: jxgao, erdemaktas, marcorr, hch, m.szyprowski, robin.murphy,
gregkh, saravanak, heikki.krogerus, rafael.j.wysocki,
andriy.shevchenko, dan.j.williams, bgolaszewski, jroedel, iommu,
konrad.wilk, kbusch, axboe, sagi, linux-nvme, linux-kernel
For devices that need to preserve address offset on mapping through
swiotlb, this patch adds offset preserving based on page_offset_mask
and keeps the offset if the mask is non zero. This is needed for
device drivers like NVMe.
Signed-off-by: Jianxiong Gao <jxgao@google.com>
---
kernel/dma/swiotlb.c | 27 ++++++++++++++++++++++++---
1 file changed, 24 insertions(+), 3 deletions(-)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 7c42df6e6100..eeb640df35f3 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -468,7 +468,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start);
unsigned long flags;
phys_addr_t tlb_addr;
- unsigned int nslots, stride, index, wrap;
+ unsigned int nslots, stride, index, wrap, min_align_mask, page_offset;
int i;
unsigned long mask;
unsigned long offset_slots;
@@ -500,12 +500,16 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
: 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
+ min_align_mask = dma_get_min_align_mask(hwdev);
+ page_offset = orig_addr & min_align_mask;
+ alloc_size += page_offset;
+
/*
* For mappings greater than or equal to a page, we limit the stride
* (and hence alignment) to a page size.
*/
nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
- if (alloc_size >= PAGE_SIZE)
+ if ((alloc_size >= PAGE_SIZE) || (min_align_mask > (1 << IO_TLB_SHIFT)))
stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT));
else
stride = 1;
@@ -583,6 +587,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
*/
for (i = 0; i < nslots; i++)
io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
+ /*
+ * When keeping the offset of the original data, we need to advance
+ * the tlb_addr by the offset of orig_addr.
+ */
+ tlb_addr += page_offset;
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);
@@ -598,7 +607,11 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
enum dma_data_direction dir, unsigned long attrs)
{
unsigned long flags;
- int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+ unsigned int num_page_offset_slabs;
+ unsigned int min_align_mask = dma_get_min_align_mask(hwdev);
+ int i, count;
+ int nslots = ALIGN(alloc_size + (tlb_addr & min_align_mask),
+ 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
phys_addr_t orig_addr = io_tlb_orig_addr[index];
@@ -610,6 +623,14 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_FROM_DEVICE);
+ /*
+ * When dma_get_min_align_mask is used, we may have padded more slabs
+ * when padding exceeds one slab. We need to move index back to the
+ * beginning of the padding.
+ */
+ num_page_offset_slabs = (tlb_addr & min_align_mask) / (1 << IO_TLB_SHIFT);
+ index -= num_page_offset_slabs;
+
/*
* Return the buffer to the free list by setting the corresponding
* entries to indicate the number of contiguous entries available.
--
2.27.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.
2021-02-01 18:30 [PATCH V2 0/3] SWIOTLB: Preserve swiotlb map offset when needed Jianxiong Gao
2021-02-01 18:30 ` [PATCH V2 1/3] Adding page_offset_mask to device_dma_parameters Jianxiong Gao
2021-02-01 18:30 ` [PATCH V2 2/3] Add swiotlb offset preserving mapping when dma_dma_parameters->page_offset_mask is non zero Jianxiong Gao
@ 2021-02-01 18:30 ` Jianxiong Gao
2021-02-01 18:55 ` Andy Shevchenko
2021-02-01 20:57 ` Keith Busch
2021-02-02 11:53 ` [PATCH V2 0/3] SWIOTLB: Preserve swiotlb map offset when needed Greg KH
3 siblings, 2 replies; 15+ messages in thread
From: Jianxiong Gao @ 2021-02-01 18:30 UTC (permalink / raw)
To: jxgao, erdemaktas, marcorr, hch, m.szyprowski, robin.murphy,
gregkh, saravanak, heikki.krogerus, rafael.j.wysocki,
andriy.shevchenko, dan.j.williams, bgolaszewski, jroedel, iommu,
konrad.wilk, kbusch, axboe, sagi, linux-nvme, linux-kernel
NVMe driver relies on the address offset to function properly.
This patch adds the offset preserve mask to NVMe driver when mapping
via dma_map_sg_attrs and unmapping via nvme_unmap_sg. The mask
depends on the page size defined by CC.MPS register of NVMe
controller.
Signed-off-by: Jianxiong Gao <jxgao@google.com>
---
drivers/nvme/host/pci.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 81e6389b2042..30e45f7e0f75 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -580,12 +580,15 @@ static void nvme_free_sgls(struct nvme_dev *dev, struct request *req)
static void nvme_unmap_sg(struct nvme_dev *dev, struct request *req)
{
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-
+ if (dma_set_min_align_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1))
+ dev_warn(dev->dev, "dma_set_min_align_mask failed to set offset\n");
if (is_pci_p2pdma_page(sg_page(iod->sg)))
pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents,
rq_dma_dir(req));
else
dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
+ if (dma_set_min_align_mask(dev->dev, 0))
+ dev_warn(dev->dev, "dma_set_min_align_mask failed to reset offset\n");
}
static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
@@ -842,7 +845,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
{
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
blk_status_t ret = BLK_STS_RESOURCE;
- int nr_mapped;
+ int nr_mapped, offset_ret;
if (blk_rq_nr_phys_segments(req) == 1) {
struct bio_vec bv = req_bvec(req);
@@ -868,12 +871,24 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
if (!iod->nents)
goto out_free_sg;
+ offset_ret = dma_set_min_align_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1);
+ if (offset_ret) {
+ dev_warn(dev->dev, "dma_set_min_align_mask failed to set offset\n");
+ goto out_free_sg;
+ }
+
if (is_pci_p2pdma_page(sg_page(iod->sg)))
nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN);
else
nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
rq_dma_dir(req), DMA_ATTR_NO_WARN);
+
+ offset_ret = dma_set_min_align_mask(dev->dev, 0);
+ if (offset_ret) {
+ dev_warn(dev->dev, "dma_set_min_align_mask failed to reset offset\n");
+ goto out_free_sg;
+ }
if (!nr_mapped)
goto out_free_sg;
--
2.27.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.
2021-02-01 18:30 ` [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver Jianxiong Gao
@ 2021-02-01 18:55 ` Andy Shevchenko
2021-02-01 19:35 ` Jianxiong Gao
2021-02-01 20:57 ` Keith Busch
1 sibling, 1 reply; 15+ messages in thread
From: Andy Shevchenko @ 2021-02-01 18:55 UTC (permalink / raw)
To: Jianxiong Gao
Cc: erdemaktas, marcorr, hch, m.szyprowski, robin.murphy, gregkh,
saravanak, heikki.krogerus, rafael.j.wysocki, dan.j.williams,
bgolaszewski, jroedel, iommu, konrad.wilk, kbusch, axboe, sagi,
linux-nvme, linux-kernel
On Mon, Feb 01, 2021 at 10:30:17AM -0800, Jianxiong Gao wrote:
> NVMe driver relies on the address offset to function properly.
> This patch adds the offset preserve mask to NVMe driver when mapping
> via dma_map_sg_attrs and unmapping via nvme_unmap_sg. The mask
> depends on the page size defined by CC.MPS register of NVMe
> controller.
...
> if (is_pci_p2pdma_page(sg_page(iod->sg)))
> nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
> iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN);
> else
> nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
> rq_dma_dir(req), DMA_ATTR_NO_WARN);
> +
> + offset_ret = dma_set_min_align_mask(dev->dev, 0);
> + if (offset_ret) {
> + dev_warn(dev->dev, "dma_set_min_align_mask failed to reset offset\n");
> + goto out_free_sg;
> + }
Seems like rebasing effect which makes empty line goes in the middle of some
other group of code lines.
> if (!nr_mapped)
> goto out_free_sg;
Perhaps it should be here?
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.
2021-02-01 18:55 ` Andy Shevchenko
@ 2021-02-01 19:35 ` Jianxiong Gao
0 siblings, 0 replies; 15+ messages in thread
From: Jianxiong Gao @ 2021-02-01 19:35 UTC (permalink / raw)
To: Andy Shevchenko
Cc: Erdem Aktas, Marc Orr, Christoph Hellwig, m.szyprowski,
Robin Murphy, gregkh, Saravana Kannan, heikki.krogerus,
rafael.j.wysocki, dan.j.williams, bgolaszewski, jroedel, iommu,
Konrad Rzeszutek Wilk, Keith Busch, axboe, sagi, linux-nvme,
linux-kernel
On Mon, Feb 1, 2021 at 10:56 AM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
>
> On Mon, Feb 01, 2021 at 10:30:17AM -0800, Jianxiong Gao wrote:
> > NVMe driver relies on the address offset to function properly.
> > This patch adds the offset preserve mask to NVMe driver when mapping
> > via dma_map_sg_attrs and unmapping via nvme_unmap_sg. The mask
> > depends on the page size defined by CC.MPS register of NVMe
> > controller.
>
> ...
>
> > if (is_pci_p2pdma_page(sg_page(iod->sg)))
> > nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
> > iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN);
> > else
> > nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
> > rq_dma_dir(req), DMA_ATTR_NO_WARN);
> > +
> > + offset_ret = dma_set_min_align_mask(dev->dev, 0);
> > + if (offset_ret) {
> > + dev_warn(dev->dev, "dma_set_min_align_mask failed to reset offset\n");
> > + goto out_free_sg;
> > + }
>
> Seems like rebasing effect which makes empty line goes in the middle of some
> other group of code lines.
>
> > if (!nr_mapped)
> > goto out_free_sg;
>
> Perhaps it should be here?
Yes you are correct, it should be
else
nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
rq_dma_dir(req), DMA_ATTR_NO_WARN);
if (!nr_mapped)
goto out_free_sg;
+
+ offset_ret = dma_set_min_align_mask(dev->dev, 0);
+ if (offset_ret) {
+ dev_warn(dev->dev, "dma_set_min_align_mask failed to
reset offset\n");
+ goto out_free_sg;
+ }
Thanks for pointing it out.
>
> --
> With Best Regards,
> Andy Shevchenko
>
>
--
Jianxiong Gao
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.
2021-02-01 18:30 ` [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver Jianxiong Gao
2021-02-01 18:55 ` Andy Shevchenko
@ 2021-02-01 20:57 ` Keith Busch
[not found] ` <CAMGD6P2Gz9nWELMdsAhwQvXx3PXv2aXet=Tn9Rca61obZawfgw@mail.gmail.com>
1 sibling, 1 reply; 15+ messages in thread
From: Keith Busch @ 2021-02-01 20:57 UTC (permalink / raw)
To: Jianxiong Gao
Cc: erdemaktas, marcorr, hch, m.szyprowski, robin.murphy, gregkh,
saravanak, heikki.krogerus, rafael.j.wysocki, andriy.shevchenko,
dan.j.williams, bgolaszewski, jroedel, iommu, konrad.wilk, axboe,
sagi, linux-nvme, linux-kernel
On Mon, Feb 01, 2021 at 10:30:17AM -0800, Jianxiong Gao wrote:
> @@ -868,12 +871,24 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
> if (!iod->nents)
> goto out_free_sg;
>
> + offset_ret = dma_set_min_align_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1);
> + if (offset_ret) {
> + dev_warn(dev->dev, "dma_set_min_align_mask failed to set offset\n");
> + goto out_free_sg;
> + }
> +
> if (is_pci_p2pdma_page(sg_page(iod->sg)))
> nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
> iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN);
> else
> nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
> rq_dma_dir(req), DMA_ATTR_NO_WARN);
> +
> + offset_ret = dma_set_min_align_mask(dev->dev, 0);
> + if (offset_ret) {
> + dev_warn(dev->dev, "dma_set_min_align_mask failed to reset offset\n");
> + goto out_free_sg;
> + }
> if (!nr_mapped)
> goto out_free_sg;
Why is this setting being done and undone on each IO? Wouldn't it be
more efficient to set it once during device initialization?
And more importantly, this isn't thread safe: one CPU may be setting the
device's dma alignment mask to 0 while another CPU is expecting it to be
NVME_CTRL_PAGE_SIZE - 1.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH V2 0/3] SWIOTLB: Preserve swiotlb map offset when needed.
2021-02-01 18:30 [PATCH V2 0/3] SWIOTLB: Preserve swiotlb map offset when needed Jianxiong Gao
` (2 preceding siblings ...)
2021-02-01 18:30 ` [PATCH V2 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver Jianxiong Gao
@ 2021-02-02 11:53 ` Greg KH
3 siblings, 0 replies; 15+ messages in thread
From: Greg KH @ 2021-02-02 11:53 UTC (permalink / raw)
To: Jianxiong Gao
Cc: erdemaktas, marcorr, hch, m.szyprowski, robin.murphy, saravanak,
heikki.krogerus, rafael.j.wysocki, andriy.shevchenko,
dan.j.williams, bgolaszewski, jroedel, iommu, konrad.wilk,
kbusch, axboe, sagi, linux-nvme, linux-kernel
On Mon, Feb 01, 2021 at 10:30:14AM -0800, Jianxiong Gao wrote:
> NVMe driver and other applications may depend on the data offset
> to operate correctly. Currently when unaligned data is mapped via
> SWIOTLB, the data is mapped as slab aligned with the SWIOTLB. This
> patch adds an option to make sure the mapped data preserves its
> offset of the orginal addrss.
>
> Without the patch when creating xfs formatted disk on NVMe backends,
> with swiotlb=force in kernel boot option, creates the following error:
> meta-data=/dev/nvme2n1 isize=512 agcount=4, agsize=131072 blks
> = sectsz=512 attr=2, projid32bit=1
> = crc=1 finobt=1, sparse=0, rmapbt=0, refl
> ink=0
> data = bsize=4096 blocks=524288, imaxpct=25
> = sunit=0 swidth=0 blks
> naming =version 2 bsize=4096 ascii-ci=0 ftype=1
> log =internal log bsize=4096 blocks=2560, version=2
> = sectsz=512 sunit=0 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
> mkfs.xfs: pwrite failed: Input/output error
>
> Jianxiong Gao (3):
> Adding page_offset_mask to device_dma_parameters
> Add swiotlb offset preserving mapping when
> dma_dma_parameters->page_offset_mask is non zero.
> Adding device_dma_parameters->offset_preserve_mask to NVMe driver.
>
> drivers/nvme/host/pci.c | 4 ++++
> include/linux/device.h | 1 +
> include/linux/dma-mapping.h | 17 +++++++++++++++++
> kernel/dma/swiotlb.c | 16 +++++++++++++++-
> 4 files changed, 37 insertions(+), 1 deletion(-)
>
> --
> 2.27.0
>
You forgot to mention somewhere, what changed from v1 to v2 :(
^ permalink raw reply [flat|nested] 15+ messages in thread