Re: [PATCH] nvme: fix handling mapping failure

From: Christoph Hellwig <hch@lst.de>
To: Marc Orr <marcorr@google.com>
Cc: kbusch@kernel.org, axboe@fb.com, hch@lst.de, sagi@grimberg.me,
	jxgao@google.com, linux-nvme@lists.infradead.org,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH] nvme: fix handling mapping failure
Date: Tue, 19 Jan 2021 19:00:24 +0100	[thread overview]
Message-ID: <20210119180024.GA28024@lst.de> (raw)
In-Reply-To: <20210119175336.4016923-1-marcorr@google.com>

On Tue, Jan 19, 2021 at 09:53:36AM -0800, Marc Orr wrote:
> This patch ensures that when `nvme_map_data()` fails to map the
> addresses in a scatter/gather list:
> 
> * The addresses are not incorrectly unmapped. The underlying
> scatter/gather code unmaps the addresses after detecting a failure.
> Thus, unmapping them again in the driver is a bug.
> * The DMA pool allocations are not deallocated when they were never
> allocated.
> 
> The bug that motivated this patch was the following sequence, which
> occurred within the NVMe driver, with the kernel flag `swiotlb=force`.
> 
> * NVMe driver calls dma_direct_map_sg()
> * dma_direct_map_sg() fails part way through the scatter gather/list
> * dma_direct_map_sg() calls dma_direct_unmap_sg() to unmap any entries
>   succeeded.
> * NVMe driver calls dma_direct_unmap_sg(), redundantly, leading to a
>   double unmap, which is a bug.
> 
> Before this patch, I observed intermittent application- and VM-level
> failures when running a benchmark, fio, in an AMD SEV guest. This patch
> resolves the failures.

I think the right way to fix this is to just do a proper unwind insted
of calling a catchall function.  Can you try this patch?

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 25456d02eddb8c..47d7075053b6b2 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -842,7 +842,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 	sg_init_table(iod->sg, blk_rq_nr_phys_segments(req));
 	iod->nents = blk_rq_map_sg(req->q, req, iod->sg);
 	if (!iod->nents)
-		goto out;
+		goto out_free_sg;
 
 	if (is_pci_p2pdma_page(sg_page(iod->sg)))
 		nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
@@ -851,16 +851,25 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
 		nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
 					     rq_dma_dir(req), DMA_ATTR_NO_WARN);
 	if (!nr_mapped)
-		goto out;
+		goto out_free_sg;
 
 	iod->use_sgl = nvme_pci_use_sgls(dev, req);
 	if (iod->use_sgl)
 		ret = nvme_pci_setup_sgls(dev, req, &cmnd->rw, nr_mapped);
 	else
 		ret = nvme_pci_setup_prps(dev, req, &cmnd->rw);
-out:
 	if (ret != BLK_STS_OK)
-		nvme_unmap_data(dev, req);
+		goto out_dma_unmap;
+	return BLK_STS_OK;
+
+out_dma_unmap:
+	if (is_pci_p2pdma_page(sg_page(iod->sg)))
+		pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents,
+				    rq_dma_dir(req));
+	else
+		dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
+out_free_sg:
+	mempool_free(iod->sg, dev->iod_mempool);
 	return ret;
 }