* [PATCH] nvme: fix handling mapping failure
@ 2021-01-19 17:53 Marc Orr
2021-01-19 18:00 ` Christoph Hellwig
0 siblings, 1 reply; 3+ messages in thread
From: Marc Orr @ 2021-01-19 17:53 UTC (permalink / raw)
To: kbusch, axboe, hch, sagi, jxgao, linux-nvme, linux-kernel
Cc: Marc Orr, stable
This patch ensures that when `nvme_map_data()` fails to map the
addresses in a scatter/gather list:
* The addresses are not incorrectly unmapped. The underlying
scatter/gather code unmaps the addresses after detecting a failure.
Thus, unmapping them again in the driver is a bug.
* The DMA pool allocations are not deallocated when they were never
allocated.
The bug that motivated this patch was the following sequence, which
occurred within the NVMe driver, with the kernel flag `swiotlb=force`.
* NVMe driver calls dma_direct_map_sg()
* dma_direct_map_sg() fails part way through the scatter gather/list
* dma_direct_map_sg() calls dma_direct_unmap_sg() to unmap any entries
succeeded.
* NVMe driver calls dma_direct_unmap_sg(), redundantly, leading to a
double unmap, which is a bug.
Before this patch, I observed intermittent application- and VM-level
failures when running a benchmark, fio, in an AMD SEV guest. This patch
resolves the failures.
Tested-by: Marc Orr <marcorr@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Marc Orr <marcorr@google.com>
---
drivers/nvme/host/pci.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 9b1fc8633cfe..8b504ed08321 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -543,11 +543,14 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
WARN_ON_ONCE(!iod->nents);
- if (is_pci_p2pdma_page(sg_page(iod->sg)))
- pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents,
- rq_dma_dir(req));
- else
- dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
+ if (!dma_mapping_error(dev->dev, iod->first_dma)) {
+ if (is_pci_p2pdma_page(sg_page(iod->sg)))
+ pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents,
+ rq_dma_dir(req));
+ else
+ dma_unmap_sg(dev->dev, iod->sg, iod->nents,
+ rq_dma_dir(req));
+ }
if (iod->npages == 0)
@@ -836,8 +839,11 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
else
nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
rq_dma_dir(req), DMA_ATTR_NO_WARN);
- if (!nr_mapped)
+ if (!nr_mapped) {
+ iod->first_dma = DMA_MAPPING_ERROR;
+ iod->npages = -1;
goto out;
+ }
iod->use_sgl = nvme_pci_use_sgls(dev, req);
if (iod->use_sgl)
--
2.30.0.284.gd98b1dd5eaa7-goog
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] nvme: fix handling mapping failure
2021-01-19 17:53 [PATCH] nvme: fix handling mapping failure Marc Orr
@ 2021-01-19 18:00 ` Christoph Hellwig
2021-01-19 23:12 ` Marc Orr
0 siblings, 1 reply; 3+ messages in thread
From: Christoph Hellwig @ 2021-01-19 18:00 UTC (permalink / raw)
To: Marc Orr
Cc: kbusch, axboe, hch, sagi, jxgao, linux-nvme, linux-kernel, stable
On Tue, Jan 19, 2021 at 09:53:36AM -0800, Marc Orr wrote:
> This patch ensures that when `nvme_map_data()` fails to map the
> addresses in a scatter/gather list:
>
> * The addresses are not incorrectly unmapped. The underlying
> scatter/gather code unmaps the addresses after detecting a failure.
> Thus, unmapping them again in the driver is a bug.
> * The DMA pool allocations are not deallocated when they were never
> allocated.
>
> The bug that motivated this patch was the following sequence, which
> occurred within the NVMe driver, with the kernel flag `swiotlb=force`.
>
> * NVMe driver calls dma_direct_map_sg()
> * dma_direct_map_sg() fails part way through the scatter gather/list
> * dma_direct_map_sg() calls dma_direct_unmap_sg() to unmap any entries
> succeeded.
> * NVMe driver calls dma_direct_unmap_sg(), redundantly, leading to a
> double unmap, which is a bug.
>
> Before this patch, I observed intermittent application- and VM-level
> failures when running a benchmark, fio, in an AMD SEV guest. This patch
> resolves the failures.
I think the right way to fix this is to just do a proper unwind insted
of calling a catchall function. Can you try this patch?
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 25456d02eddb8c..47d7075053b6b2 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -842,7 +842,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
sg_init_table(iod->sg, blk_rq_nr_phys_segments(req));
iod->nents = blk_rq_map_sg(req->q, req, iod->sg);
if (!iod->nents)
- goto out;
+ goto out_free_sg;
if (is_pci_p2pdma_page(sg_page(iod->sg)))
nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
@@ -851,16 +851,25 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
rq_dma_dir(req), DMA_ATTR_NO_WARN);
if (!nr_mapped)
- goto out;
+ goto out_free_sg;
iod->use_sgl = nvme_pci_use_sgls(dev, req);
if (iod->use_sgl)
ret = nvme_pci_setup_sgls(dev, req, &cmnd->rw, nr_mapped);
else
ret = nvme_pci_setup_prps(dev, req, &cmnd->rw);
-out:
if (ret != BLK_STS_OK)
- nvme_unmap_data(dev, req);
+ goto out_dma_unmap;
+ return BLK_STS_OK;
+
+out_dma_unmap:
+ if (is_pci_p2pdma_page(sg_page(iod->sg)))
+ pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents,
+ rq_dma_dir(req));
+ else
+ dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
+out_free_sg:
+ mempool_free(iod->sg, dev->iod_mempool);
return ret;
}
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] nvme: fix handling mapping failure
2021-01-19 18:00 ` Christoph Hellwig
@ 2021-01-19 23:12 ` Marc Orr
0 siblings, 0 replies; 3+ messages in thread
From: Marc Orr @ 2021-01-19 23:12 UTC (permalink / raw)
To: Christoph Hellwig
Cc: kbusch, axboe, sagi, Jianxiong Gao, linux-nvme, linux-kernel, stable
On Tue, Jan 19, 2021 at 10:00 AM Christoph Hellwig <hch@lst.de> wrote:
>
> On Tue, Jan 19, 2021 at 09:53:36AM -0800, Marc Orr wrote:
> > This patch ensures that when `nvme_map_data()` fails to map the
> > addresses in a scatter/gather list:
> >
> > * The addresses are not incorrectly unmapped. The underlying
> > scatter/gather code unmaps the addresses after detecting a failure.
> > Thus, unmapping them again in the driver is a bug.
> > * The DMA pool allocations are not deallocated when they were never
> > allocated.
> >
> > The bug that motivated this patch was the following sequence, which
> > occurred within the NVMe driver, with the kernel flag `swiotlb=force`.
> >
> > * NVMe driver calls dma_direct_map_sg()
> > * dma_direct_map_sg() fails part way through the scatter gather/list
> > * dma_direct_map_sg() calls dma_direct_unmap_sg() to unmap any entries
> > succeeded.
> > * NVMe driver calls dma_direct_unmap_sg(), redundantly, leading to a
> > double unmap, which is a bug.
> >
> > Before this patch, I observed intermittent application- and VM-level
> > failures when running a benchmark, fio, in an AMD SEV guest. This patch
> > resolves the failures.
>
> I think the right way to fix this is to just do a proper unwind insted
> of calling a catchall function. Can you try this patch?
Done. It works great, thanks! Shall I send out a v2 with what you've proposed?
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 25456d02eddb8c..47d7075053b6b2 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -842,7 +842,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
> sg_init_table(iod->sg, blk_rq_nr_phys_segments(req));
> iod->nents = blk_rq_map_sg(req->q, req, iod->sg);
> if (!iod->nents)
> - goto out;
> + goto out_free_sg;
>
> if (is_pci_p2pdma_page(sg_page(iod->sg)))
> nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
> @@ -851,16 +851,25 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
> nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
> rq_dma_dir(req), DMA_ATTR_NO_WARN);
> if (!nr_mapped)
> - goto out;
> + goto out_free_sg;
>
> iod->use_sgl = nvme_pci_use_sgls(dev, req);
> if (iod->use_sgl)
> ret = nvme_pci_setup_sgls(dev, req, &cmnd->rw, nr_mapped);
> else
> ret = nvme_pci_setup_prps(dev, req, &cmnd->rw);
> -out:
> if (ret != BLK_STS_OK)
> - nvme_unmap_data(dev, req);
> + goto out_dma_unmap;
> + return BLK_STS_OK;
> +
> +out_dma_unmap:
> + if (is_pci_p2pdma_page(sg_page(iod->sg)))
> + pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents,
> + rq_dma_dir(req));
> + else
> + dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
Do you think it's worth hoisting this sg unmap snippet into a helper
that can be called from both here, as well as nvme_unmap_data()?
> +out_free_sg:
> + mempool_free(iod->sg, dev->iod_mempool);
> return ret;
> }
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-01-19 23:14 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-19 17:53 [PATCH] nvme: fix handling mapping failure Marc Orr
2021-01-19 18:00 ` Christoph Hellwig
2021-01-19 23:12 ` Marc Orr
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).