All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] net/mlx4: Get rid of page operation after dma_alloc_coherent
@ 2018-12-14 23:32 Stephen Warren
  2018-12-18 13:40 ` Tariq Toukan
  2018-12-18 16:32 ` Jason Gunthorpe
  0 siblings, 2 replies; 10+ messages in thread
From: Stephen Warren @ 2018-12-14 23:32 UTC (permalink / raw)
  To: Tariq Toukan, Wei, Hu, \ (Xavier\)
  Cc: netdev, linux-rdma, Doug Ledford, Jason Gunthorpe, Stephen Warren

From: Stephen Warren <swarren@nvidia.com>

This is a port of commit 378efe798ecf ("RDMA/hns: Get rid of page
operation after dma_alloc_coherent") to the mlx4 driver. That change was
described as:

> In general, dma_alloc_coherent() returns a CPU virtual address and
> a DMA address, and we have no guarantee that the underlying memory
> even has an associated struct page at all.
>
> This patch gets rid of the page operation after dma_alloc_coherent,
> and records the VA returned form dma_alloc_coherent in the struct
> of hem in hns RoCE driver.

Differences in this port relative to the hns patch:

1) The hns patch only needed to fix a dma_alloc_coherent path, but this
patch also needs to fix an alloc_pages path. This appears to be simple
except for the next point.

2) The hns patch converted a bunch of code to consistently use
sg_dma_len(mem) rather than a mix of that and mem->length However, it
seems that sg_dma_len(mem) can be modified or zeroed at runtime, and so
using it when calling e.g. __free_pages is problematic. I suspect the
same issue affects mlx4_table_find() somehow too, and similarly I expect
the hns driver has an issue in this area. Instead, this patch converts
everything to use mem->length instead. I'd like some feedback on this
issue.

Signed-off-by: Stephen Warren <swarren@nvidia.com>
---
Note: I've tested this patch in downstream 4.9 and 4.14 based kernels,
but can't test it in mainline (at least ibping works) since my system
isn't supported their yet. I have compile-tested it in mainline at
least, for ARM64.

 drivers/net/ethernet/mellanox/mlx4/icm.c | 51 ++++++++++++++----------
 drivers/net/ethernet/mellanox/mlx4/icm.h |  1 +
 2 files changed, 30 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c b/drivers/net/ethernet/mellanox/mlx4/icm.c
index 4b4351141b94..b14207cef69e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/icm.c
+++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
@@ -72,7 +72,7 @@ static void mlx4_free_icm_coherent(struct mlx4_dev *dev, struct mlx4_icm_chunk *
 	for (i = 0; i < chunk->npages; ++i)
 		dma_free_coherent(&dev->persist->pdev->dev,
 				  chunk->mem[i].length,
-				  lowmem_page_address(sg_page(&chunk->mem[i])),
+				  chunk->buf[i],
 				  sg_dma_address(&chunk->mem[i]));
 }
 
@@ -95,9 +95,11 @@ void mlx4_free_icm(struct mlx4_dev *dev, struct mlx4_icm *icm, int coherent)
 	kfree(icm);
 }
 
-static int mlx4_alloc_icm_pages(struct scatterlist *mem, int order,
+static int mlx4_alloc_icm_pages(struct mlx4_icm_chunk *chunk, int order,
 				gfp_t gfp_mask, int node)
 {
+	struct scatterlist *mem = &chunk->mem[chunk->npages];
+	void **buf = &chunk->buf[chunk->npages];
 	struct page *page;
 
 	page = alloc_pages_node(node, gfp_mask, order);
@@ -108,25 +110,30 @@ static int mlx4_alloc_icm_pages(struct scatterlist *mem, int order,
 	}
 
 	sg_set_page(mem, page, PAGE_SIZE << order, 0);
+	*buf = lowmem_page_address(page);
 	return 0;
 }
 
-static int mlx4_alloc_icm_coherent(struct device *dev, struct scatterlist *mem,
-				    int order, gfp_t gfp_mask)
+static int mlx4_alloc_icm_coherent(struct device *dev,
+				   struct mlx4_icm_chunk *chunk, int order,
+				   gfp_t gfp_mask)
 {
-	void *buf = dma_alloc_coherent(dev, PAGE_SIZE << order,
-				       &sg_dma_address(mem), gfp_mask);
-	if (!buf)
+	struct scatterlist *mem = &chunk->mem[chunk->npages];
+	void **buf = &chunk->buf[chunk->npages];
+
+	*buf = dma_alloc_coherent(dev, PAGE_SIZE << order,
+				  &sg_dma_address(mem), gfp_mask);
+	if (!*buf)
 		return -ENOMEM;
 
-	if (offset_in_page(buf)) {
+	if (offset_in_page(*buf)) {
 		dma_free_coherent(dev, PAGE_SIZE << order,
-				  buf, sg_dma_address(mem));
+				  *buf, sg_dma_address(mem));
 		return -ENOMEM;
 	}
 
-	sg_set_buf(mem, buf, PAGE_SIZE << order);
 	sg_dma_len(mem) = PAGE_SIZE << order;
+	mem->length = PAGE_SIZE << order;
 	return 0;
 }
 
@@ -174,6 +181,7 @@ struct mlx4_icm *mlx4_alloc_icm(struct mlx4_dev *dev, int npages,
 			sg_init_table(chunk->mem, MLX4_ICM_CHUNK_LEN);
 			chunk->npages = 0;
 			chunk->nsg    = 0;
+			memset(chunk->buf, 0, sizeof(chunk->buf));
 			list_add_tail(&chunk->list, &icm->chunk_list);
 		}
 
@@ -186,11 +194,9 @@ struct mlx4_icm *mlx4_alloc_icm(struct mlx4_dev *dev, int npages,
 
 		if (coherent)
 			ret = mlx4_alloc_icm_coherent(&dev->persist->pdev->dev,
-						      &chunk->mem[chunk->npages],
-						      cur_order, mask);
+						      chunk, cur_order, mask);
 		else
-			ret = mlx4_alloc_icm_pages(&chunk->mem[chunk->npages],
-						   cur_order, mask,
+			ret = mlx4_alloc_icm_pages(chunk, cur_order, mask,
 						   dev->numa_node);
 
 		if (ret) {
@@ -316,11 +322,11 @@ void mlx4_table_put(struct mlx4_dev *dev, struct mlx4_icm_table *table, u32 obj)
 void *mlx4_table_find(struct mlx4_icm_table *table, u32 obj,
 			dma_addr_t *dma_handle)
 {
-	int offset, dma_offset, i;
+	int offset, dma_offset, i, length;
 	u64 idx;
 	struct mlx4_icm_chunk *chunk;
 	struct mlx4_icm *icm;
-	struct page *page = NULL;
+	void *addr = NULL;
 
 	if (!table->lowmem)
 		return NULL;
@@ -336,28 +342,29 @@ void *mlx4_table_find(struct mlx4_icm_table *table, u32 obj,
 
 	list_for_each_entry(chunk, &icm->chunk_list, list) {
 		for (i = 0; i < chunk->npages; ++i) {
+			length = chunk->mem[i].length;
 			if (dma_handle && dma_offset >= 0) {
-				if (sg_dma_len(&chunk->mem[i]) > dma_offset)
+				if (length > dma_offset)
 					*dma_handle = sg_dma_address(&chunk->mem[i]) +
 						dma_offset;
-				dma_offset -= sg_dma_len(&chunk->mem[i]);
+				dma_offset -= length;
 			}
 			/*
 			 * DMA mapping can merge pages but not split them,
 			 * so if we found the page, dma_handle has already
 			 * been assigned to.
 			 */
-			if (chunk->mem[i].length > offset) {
-				page = sg_page(&chunk->mem[i]);
+			if (length > offset) {
+				addr = chunk->buf[i] + offset;
 				goto out;
 			}
-			offset -= chunk->mem[i].length;
+			offset -= length;
 		}
 	}
 
 out:
 	mutex_unlock(&table->mutex);
-	return page ? lowmem_page_address(page) + offset : NULL;
+	return addr;
 }
 
 int mlx4_table_get_range(struct mlx4_dev *dev, struct mlx4_icm_table *table,
diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.h b/drivers/net/ethernet/mellanox/mlx4/icm.h
index c9169a490557..a565188dc391 100644
--- a/drivers/net/ethernet/mellanox/mlx4/icm.h
+++ b/drivers/net/ethernet/mellanox/mlx4/icm.h
@@ -52,6 +52,7 @@ struct mlx4_icm_chunk {
 	int			npages;
 	int			nsg;
 	struct scatterlist	mem[MLX4_ICM_CHUNK_LEN];
+	void			*buf[MLX4_ICM_CHUNK_LEN];
 };
 
 struct mlx4_icm {
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* RE: [RFC PATCH] net/mlx4: Get rid of page operation after dma_alloc_coherent
  2018-12-14 23:32 [RFC PATCH] net/mlx4: Get rid of page operation after dma_alloc_coherent Stephen Warren
@ 2018-12-18 13:40 ` Tariq Toukan
  2018-12-18 16:32 ` Jason Gunthorpe
  1 sibling, 0 replies; 10+ messages in thread
From: Tariq Toukan @ 2018-12-18 13:40 UTC (permalink / raw)
  To: Stephen Warren, Wei, Hu, \ (Xavier\)
  Cc: netdev, linux-rdma, Doug Ledford, Jason Gunthorpe, Stephen Warren

Hi Stephen,

Thanks for your patch.

> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> owner@vger.kernel.org] On Behalf Of Stephen Warren
> Sent: Saturday, December 15, 2018 1:33 AM
> To: Tariq Toukan <tariqt@mellanox.com>; Wei@wwwdotorg.org;
> Hu@wwwdotorg.org; \ (Xavier\) <xavier.huwei@huawei.com>
> Cc: netdev@vger.kernel.org; linux-rdma@vger.kernel.org; Doug Ledford
> <dledford@redhat.com>; Jason Gunthorpe <jgg@mellanox.com>; Stephen
> Warren <swarren@nvidia.com>
> Subject: [RFC PATCH] net/mlx4: Get rid of page operation after
> dma_alloc_coherent
> 
> From: Stephen Warren <swarren@nvidia.com>
> 
> This is a port of commit 378efe798ecf ("RDMA/hns: Get rid of page operation
> after dma_alloc_coherent") to the mlx4 driver. That change was described
> as:
> 
> > In general, dma_alloc_coherent() returns a CPU virtual address and a
> > DMA address, and we have no guarantee that the underlying memory
> even
> > has an associated struct page at all.
> >
> > This patch gets rid of the page operation after dma_alloc_coherent,
> > and records the VA returned form dma_alloc_coherent in the struct of
> > hem in hns RoCE driver.
> 

I need to review this patch first.

> Differences in this port relative to the hns patch:
> 
> 1) The hns patch only needed to fix a dma_alloc_coherent path, but this
> patch also needs to fix an alloc_pages path. This appears to be simple except
> for the next point.
> 
> 2) The hns patch converted a bunch of code to consistently use
> sg_dma_len(mem) rather than a mix of that and mem->length However, it
> seems that sg_dma_len(mem) can be modified or zeroed at runtime, and so
> using it when calling e.g. __free_pages is problematic. I suspect the same
> issue affects mlx4_table_find() somehow too, and similarly I expect the hns
> driver has an issue in this area. Instead, this patch converts everything to use
> mem->length instead. I'd like some feedback on this issue.
> 

OK. I will carefully go over this point.

Thanks.

> Signed-off-by: Stephen Warren <swarren@nvidia.com>
> ---
> Note: I've tested this patch in downstream 4.9 and 4.14 based kernels, but
> can't test it in mainline (at least ibping works) since my system isn't
> supported their yet. I have compile-tested it in mainline at least, for ARM64.
> 
>  drivers/net/ethernet/mellanox/mlx4/icm.c | 51 ++++++++++++++----------
> drivers/net/ethernet/mellanox/mlx4/icm.h |  1 +
>  2 files changed, 30 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c
> b/drivers/net/ethernet/mellanox/mlx4/icm.c
> index 4b4351141b94..b14207cef69e 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/icm.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
> @@ -72,7 +72,7 @@ static void mlx4_free_icm_coherent(struct mlx4_dev
> *dev, struct mlx4_icm_chunk *
>  	for (i = 0; i < chunk->npages; ++i)
>  		dma_free_coherent(&dev->persist->pdev->dev,
>  				  chunk->mem[i].length,
> -				  lowmem_page_address(sg_page(&chunk-
> >mem[i])),
> +				  chunk->buf[i],
>  				  sg_dma_address(&chunk->mem[i]));
>  }
> 
> @@ -95,9 +95,11 @@ void mlx4_free_icm(struct mlx4_dev *dev, struct
> mlx4_icm *icm, int coherent)
>  	kfree(icm);
>  }
> 
> -static int mlx4_alloc_icm_pages(struct scatterlist *mem, int order,
> +static int mlx4_alloc_icm_pages(struct mlx4_icm_chunk *chunk, int
> +order,
>  				gfp_t gfp_mask, int node)
>  {
> +	struct scatterlist *mem = &chunk->mem[chunk->npages];
> +	void **buf = &chunk->buf[chunk->npages];
>  	struct page *page;
> 
>  	page = alloc_pages_node(node, gfp_mask, order); @@ -108,25
> +110,30 @@ static int mlx4_alloc_icm_pages(struct scatterlist *mem, int
> order,
>  	}
> 
>  	sg_set_page(mem, page, PAGE_SIZE << order, 0);
> +	*buf = lowmem_page_address(page);
>  	return 0;
>  }
> 
> -static int mlx4_alloc_icm_coherent(struct device *dev, struct scatterlist
> *mem,
> -				    int order, gfp_t gfp_mask)
> +static int mlx4_alloc_icm_coherent(struct device *dev,
> +				   struct mlx4_icm_chunk *chunk, int order,
> +				   gfp_t gfp_mask)
>  {
> -	void *buf = dma_alloc_coherent(dev, PAGE_SIZE << order,
> -				       &sg_dma_address(mem), gfp_mask);
> -	if (!buf)
> +	struct scatterlist *mem = &chunk->mem[chunk->npages];
> +	void **buf = &chunk->buf[chunk->npages];
> +
> +	*buf = dma_alloc_coherent(dev, PAGE_SIZE << order,
> +				  &sg_dma_address(mem), gfp_mask);
> +	if (!*buf)
>  		return -ENOMEM;
> 
> -	if (offset_in_page(buf)) {
> +	if (offset_in_page(*buf)) {
>  		dma_free_coherent(dev, PAGE_SIZE << order,
> -				  buf, sg_dma_address(mem));
> +				  *buf, sg_dma_address(mem));
>  		return -ENOMEM;
>  	}
> 
> -	sg_set_buf(mem, buf, PAGE_SIZE << order);
>  	sg_dma_len(mem) = PAGE_SIZE << order;
> +	mem->length = PAGE_SIZE << order;
>  	return 0;
>  }
> 
> @@ -174,6 +181,7 @@ struct mlx4_icm *mlx4_alloc_icm(struct mlx4_dev
> *dev, int npages,
>  			sg_init_table(chunk->mem,
> MLX4_ICM_CHUNK_LEN);
>  			chunk->npages = 0;
>  			chunk->nsg    = 0;
> +			memset(chunk->buf, 0, sizeof(chunk->buf));
>  			list_add_tail(&chunk->list, &icm->chunk_list);
>  		}
> 
> @@ -186,11 +194,9 @@ struct mlx4_icm *mlx4_alloc_icm(struct mlx4_dev
> *dev, int npages,
> 
>  		if (coherent)
>  			ret = mlx4_alloc_icm_coherent(&dev->persist-
> >pdev->dev,
> -						      &chunk->mem[chunk-
> >npages],
> -						      cur_order, mask);
> +						      chunk, cur_order, mask);
>  		else
> -			ret = mlx4_alloc_icm_pages(&chunk->mem[chunk-
> >npages],
> -						   cur_order, mask,
> +			ret = mlx4_alloc_icm_pages(chunk, cur_order, mask,
>  						   dev->numa_node);
> 
>  		if (ret) {
> @@ -316,11 +322,11 @@ void mlx4_table_put(struct mlx4_dev *dev, struct
> mlx4_icm_table *table, u32 obj)  void *mlx4_table_find(struct
> mlx4_icm_table *table, u32 obj,
>  			dma_addr_t *dma_handle)
>  {
> -	int offset, dma_offset, i;
> +	int offset, dma_offset, i, length;
>  	u64 idx;
>  	struct mlx4_icm_chunk *chunk;
>  	struct mlx4_icm *icm;
> -	struct page *page = NULL;
> +	void *addr = NULL;
> 
>  	if (!table->lowmem)
>  		return NULL;
> @@ -336,28 +342,29 @@ void *mlx4_table_find(struct mlx4_icm_table
> *table, u32 obj,
> 
>  	list_for_each_entry(chunk, &icm->chunk_list, list) {
>  		for (i = 0; i < chunk->npages; ++i) {
> +			length = chunk->mem[i].length;
>  			if (dma_handle && dma_offset >= 0) {
> -				if (sg_dma_len(&chunk->mem[i]) >
> dma_offset)
> +				if (length > dma_offset)
>  					*dma_handle =
> sg_dma_address(&chunk->mem[i]) +
>  						dma_offset;
> -				dma_offset -= sg_dma_len(&chunk-
> >mem[i]);
> +				dma_offset -= length;
>  			}
>  			/*
>  			 * DMA mapping can merge pages but not split them,
>  			 * so if we found the page, dma_handle has already
>  			 * been assigned to.
>  			 */
> -			if (chunk->mem[i].length > offset) {
> -				page = sg_page(&chunk->mem[i]);
> +			if (length > offset) {
> +				addr = chunk->buf[i] + offset;
>  				goto out;
>  			}
> -			offset -= chunk->mem[i].length;
> +			offset -= length;
>  		}
>  	}
> 
>  out:
>  	mutex_unlock(&table->mutex);
> -	return page ? lowmem_page_address(page) + offset : NULL;
> +	return addr;
>  }
> 
>  int mlx4_table_get_range(struct mlx4_dev *dev, struct mlx4_icm_table
> *table, diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.h
> b/drivers/net/ethernet/mellanox/mlx4/icm.h
> index c9169a490557..a565188dc391 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/icm.h
> +++ b/drivers/net/ethernet/mellanox/mlx4/icm.h
> @@ -52,6 +52,7 @@ struct mlx4_icm_chunk {
>  	int			npages;
>  	int			nsg;
>  	struct scatterlist	mem[MLX4_ICM_CHUNK_LEN];
> +	void			*buf[MLX4_ICM_CHUNK_LEN];
>  };
> 
>  struct mlx4_icm {
> --
> 2.19.2

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] net/mlx4: Get rid of page operation after dma_alloc_coherent
  2018-12-14 23:32 [RFC PATCH] net/mlx4: Get rid of page operation after dma_alloc_coherent Stephen Warren
  2018-12-18 13:40 ` Tariq Toukan
@ 2018-12-18 16:32 ` Jason Gunthorpe
  2018-12-18 17:08   ` Stephen Warren
  1 sibling, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2018-12-18 16:32 UTC (permalink / raw)
  To: Stephen Warren
  Cc: Tariq Toukan, Wei, Hu, \ (Xavier\),
	netdev, linux-rdma, Doug Ledford, Stephen Warren

On Fri, Dec 14, 2018 at 04:32:54PM -0700, Stephen Warren wrote:
> From: Stephen Warren <swarren@nvidia.com>
> 
> This is a port of commit 378efe798ecf ("RDMA/hns: Get rid of page
> operation after dma_alloc_coherent") to the mlx4 driver. That change was
> described as:
> 
> > In general, dma_alloc_coherent() returns a CPU virtual address and
> > a DMA address, and we have no guarantee that the underlying memory
> > even has an associated struct page at all.
> >
> > This patch gets rid of the page operation after dma_alloc_coherent,
> > and records the VA returned form dma_alloc_coherent in the struct
> > of hem in hns RoCE driver.
> 
> Differences in this port relative to the hns patch:
> 
> 1) The hns patch only needed to fix a dma_alloc_coherent path, but this
> patch also needs to fix an alloc_pages path. This appears to be simple
> except for the next point.
> 
> 2) The hns patch converted a bunch of code to consistently use
> sg_dma_len(mem) rather than a mix of that and mem->length However, it
> seems that sg_dma_len(mem) can be modified or zeroed at runtime, and so
> using it when calling e.g. __free_pages is problematic.

dma_len should only ever be used when programming a HW device to do
DMA. It certainly should never be used for anything else, so I'm not
sure why this description veered off into talking about alloc_pages?

If pages were allocated and described in a sg list then the CPU side
must use the pages/len part of the SGL to walk that list of pages.

I also don't really see a practical problem with putting the virtual
address pointer of DMA coherent memory in the SGL, so long as it is
never used in a DMA map operation or otherwise.

.. so again, what is it this is actually trying to fix in mlx4?

Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] net/mlx4: Get rid of page operation after dma_alloc_coherent
  2018-12-18 16:32 ` Jason Gunthorpe
@ 2018-12-18 17:08   ` Stephen Warren
  2018-12-18 17:12     ` Jason Gunthorpe
  0 siblings, 1 reply; 10+ messages in thread
From: Stephen Warren @ 2018-12-18 17:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Tariq Toukan, xavier.huwei, netdev, linux-rdma, Doug Ledford

On 12/18/18 9:32 AM, Jason Gunthorpe wrote:
> On Fri, Dec 14, 2018 at 04:32:54PM -0700, Stephen Warren wrote:
>> From: Stephen Warren <swarren@nvidia.com>
>>
>> This is a port of commit 378efe798ecf ("RDMA/hns: Get rid of page
>> operation after dma_alloc_coherent") to the mlx4 driver. That change was
>> described as:
>>
>>> In general, dma_alloc_coherent() returns a CPU virtual address and
>>> a DMA address, and we have no guarantee that the underlying memory
>>> even has an associated struct page at all.
>>>
>>> This patch gets rid of the page operation after dma_alloc_coherent,
>>> and records the VA returned form dma_alloc_coherent in the struct
>>> of hem in hns RoCE driver.
>>
>> Differences in this port relative to the hns patch:
>>
>> 1) The hns patch only needed to fix a dma_alloc_coherent path, but this
>> patch also needs to fix an alloc_pages path. This appears to be simple
>> except for the next point.
>>
>> 2) The hns patch converted a bunch of code to consistently use
>> sg_dma_len(mem) rather than a mix of that and mem->length However, it
>> seems that sg_dma_len(mem) can be modified or zeroed at runtime, and so
>> using it when calling e.g. __free_pages is problematic.
> 
> dma_len should only ever be used when programming a HW device to do
> DMA. It certainly should never be used for anything else, so I'm not
> sure why this description veered off into talking about alloc_pages?
> 
> If pages were allocated and described in a sg list then the CPU side
> must use the pages/len part of the SGL to walk that list of pages.
> 
> I also don't really see a practical problem with putting the virtual
> address pointer of DMA coherent memory in the SGL, so long as it is
> never used in a DMA map operation or otherwise.
> 
> .. so again, what is it this is actually trying to fix in mlx4?

The same thing that the original hns patch fixed, and in the exact same 
way. Namely a crash during driver unload or system shutdown in the path 
that frees allocated memory contained in the sg list.

The reason is that the allocation does:

static int mlx4_alloc_icm_coherent(...
...
         void *buf = dma_alloc_coherent(dev, PAGE_SIZE << order,
                                        &sg_dma_address(mem), gfp_mask);
...
         sg_set_buf(mem, buf, PAGE_SIZE << order);
         sg_dma_len(mem) = PAGE_SIZE << order;

And free does:

static void mlx4_free_icm_coherent(...
...
     dma_free_coherent(&dev->persist->pdev->dev,
                       chunk->mem[i].length,
                       lowmem_page_address(sg_page(&chunk->mem[i])),

However, there's no guarantee that dma_alloc_coherent() returned memory 
for which a struct page exists, and hence the call to sg_page() and/or 
lowmem_page_address() can fail. To fix this, we add a second field to 
the mlx4 table struct which holds the return value from 
dma_alloc_coherent() so that value can be passed to dma_free_coherent() 
directly, rather than trying to re-derive the value in 
mlx4_free_icm_coherent().

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] net/mlx4: Get rid of page operation after dma_alloc_coherent
  2018-12-18 17:08   ` Stephen Warren
@ 2018-12-18 17:12     ` Jason Gunthorpe
  2018-12-18 17:45       ` Stephen Warren
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2018-12-18 17:12 UTC (permalink / raw)
  To: Stephen Warren
  Cc: Tariq Toukan, xavier.huwei, netdev, linux-rdma, Doug Ledford

On Tue, Dec 18, 2018 at 10:08:56AM -0700, Stephen Warren wrote:
> On 12/18/18 9:32 AM, Jason Gunthorpe wrote:
> > On Fri, Dec 14, 2018 at 04:32:54PM -0700, Stephen Warren wrote:
> > > From: Stephen Warren <swarren@nvidia.com>
> > > 
> > > This is a port of commit 378efe798ecf ("RDMA/hns: Get rid of page
> > > operation after dma_alloc_coherent") to the mlx4 driver. That change was
> > > described as:
> > > 
> > > > In general, dma_alloc_coherent() returns a CPU virtual address and
> > > > a DMA address, and we have no guarantee that the underlying memory
> > > > even has an associated struct page at all.
> > > > 
> > > > This patch gets rid of the page operation after dma_alloc_coherent,
> > > > and records the VA returned form dma_alloc_coherent in the struct
> > > > of hem in hns RoCE driver.
> > > 
> > > Differences in this port relative to the hns patch:
> > > 
> > > 1) The hns patch only needed to fix a dma_alloc_coherent path, but this
> > > patch also needs to fix an alloc_pages path. This appears to be simple
> > > except for the next point.
> > > 
> > > 2) The hns patch converted a bunch of code to consistently use
> > > sg_dma_len(mem) rather than a mix of that and mem->length However, it
> > > seems that sg_dma_len(mem) can be modified or zeroed at runtime, and so
> > > using it when calling e.g. __free_pages is problematic.
> > 
> > dma_len should only ever be used when programming a HW device to do
> > DMA. It certainly should never be used for anything else, so I'm not
> > sure why this description veered off into talking about alloc_pages?
> > 
> > If pages were allocated and described in a sg list then the CPU side
> > must use the pages/len part of the SGL to walk that list of pages.
> > 
> > I also don't really see a practical problem with putting the virtual
> > address pointer of DMA coherent memory in the SGL, so long as it is
> > never used in a DMA map operation or otherwise.
> > 
> > .. so again, what is it this is actually trying to fix in mlx4?
> 
> The same thing that the original hns patch fixed, and in the exact same way.
> Namely a crash during driver unload or system shutdown in the path that
> frees allocated memory contained in the sg list.
> 
> The reason is that the allocation does:
> 
> static int mlx4_alloc_icm_coherent(...
> ...
>         void *buf = dma_alloc_coherent(dev, PAGE_SIZE << order,
>                                        &sg_dma_address(mem), gfp_mask);
> ...
>         sg_set_buf(mem, buf, PAGE_SIZE << order);
>         sg_dma_len(mem) = PAGE_SIZE << order;
> 
> And free does:
> 
> static void mlx4_free_icm_coherent(...
> ...
>     dma_free_coherent(&dev->persist->pdev->dev,
>                       chunk->mem[i].length,
>                       lowmem_page_address(sg_page(&chunk->mem[i])),
> 
> However, there's no guarantee that dma_alloc_coherent() returned memory for
> which a struct page exists

> and hence the call to sg_page() and/or lowmem_page_address() can
> fail.

This is a much better explanation than what was in the patch commit
message, please revise it.

> To fix this, we add a second field to the mlx4 table struct which
> holds the return value from dma_alloc_coherent() so that value can
> be passed to dma_free_coherent() directly, rather than trying to
> re-derive the value in mlx4_free_icm_coherent().

That seems reasonable, but why did the commit message start talking
about alloc_pages then?

Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] net/mlx4: Get rid of page operation after dma_alloc_coherent
  2018-12-18 17:12     ` Jason Gunthorpe
@ 2018-12-18 17:45       ` Stephen Warren
  2018-12-18 18:43         ` Jason Gunthorpe
  0 siblings, 1 reply; 10+ messages in thread
From: Stephen Warren @ 2018-12-18 17:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Tariq Toukan, xavier.huwei, netdev, linux-rdma, Doug Ledford

On 12/18/18 10:12 AM, Jason Gunthorpe wrote:
> On Tue, Dec 18, 2018 at 10:08:56AM -0700, Stephen Warren wrote:
>> On 12/18/18 9:32 AM, Jason Gunthorpe wrote:
>>> On Fri, Dec 14, 2018 at 04:32:54PM -0700, Stephen Warren wrote:
>>>> From: Stephen Warren <swarren@nvidia.com>
>>>>
>>>> This is a port of commit 378efe798ecf ("RDMA/hns: Get rid of page
>>>> operation after dma_alloc_coherent") to the mlx4 driver. That change was
>>>> described as:
>>>>
>>>>> In general, dma_alloc_coherent() returns a CPU virtual address and
>>>>> a DMA address, and we have no guarantee that the underlying memory
>>>>> even has an associated struct page at all.
>>>>>
>>>>> This patch gets rid of the page operation after dma_alloc_coherent,
>>>>> and records the VA returned form dma_alloc_coherent in the struct
>>>>> of hem in hns RoCE driver.
>>>>
>>>> Differences in this port relative to the hns patch:
>>>>
>>>> 1) The hns patch only needed to fix a dma_alloc_coherent path, but this
>>>> patch also needs to fix an alloc_pages path. This appears to be simple
>>>> except for the next point.
>>>>
>>>> 2) The hns patch converted a bunch of code to consistently use
>>>> sg_dma_len(mem) rather than a mix of that and mem->length However, it
>>>> seems that sg_dma_len(mem) can be modified or zeroed at runtime, and so
>>>> using it when calling e.g. __free_pages is problematic.
>>>
>>> dma_len should only ever be used when programming a HW device to do
>>> DMA. It certainly should never be used for anything else, so I'm not
>>> sure why this description veered off into talking about alloc_pages?
>>>
>>> If pages were allocated and described in a sg list then the CPU side
>>> must use the pages/len part of the SGL to walk that list of pages.
>>>
>>> I also don't really see a practical problem with putting the virtual
>>> address pointer of DMA coherent memory in the SGL, so long as it is
>>> never used in a DMA map operation or otherwise.
>>>
>>> .. so again, what is it this is actually trying to fix in mlx4?
>>
>> The same thing that the original hns patch fixed, and in the exact same way.
>> Namely a crash during driver unload or system shutdown in the path that
>> frees allocated memory contained in the sg list.
>>
>> The reason is that the allocation does:
>>
>> static int mlx4_alloc_icm_coherent(...
>> ...
>>          void *buf = dma_alloc_coherent(dev, PAGE_SIZE << order,
>>                                         &sg_dma_address(mem), gfp_mask);
>> ...
>>          sg_set_buf(mem, buf, PAGE_SIZE << order);
>>          sg_dma_len(mem) = PAGE_SIZE << order;
>>
>> And free does:
>>
>> static void mlx4_free_icm_coherent(...
>> ...
>>      dma_free_coherent(&dev->persist->pdev->dev,
>>                        chunk->mem[i].length,
>>                        lowmem_page_address(sg_page(&chunk->mem[i])),
>>
>> However, there's no guarantee that dma_alloc_coherent() returned memory for
>> which a struct page exists
> 
>> and hence the call to sg_page() and/or lowmem_page_address() can
>> fail.
> 
> This is a much better explanation than what was in the patch commit
> message, please revise it.
> 
>> To fix this, we add a second field to the mlx4 table struct which
>> holds the return value from dma_alloc_coherent() so that value can
>> be passed to dma_free_coherent() directly, rather than trying to
>> re-derive the value in mlx4_free_icm_coherent().
> 
> That seems reasonable, but why did the commit message start talking
> about alloc_pages then?

There are two allocation paths; one using dma_alloc_coherent and one 
using alloc_pages. (The hns driver only has the dma_alloc_coherent 
path.) These both store allocations into an sg list which is stored in a 
table, and that table is searched by a single function mlx4_table_find() 
irrespective of which allocation path was used, so if one of the 
allocation paths is updated to store the CPU virtual address 
differently, then both paths need to be updated so they match, so that 
the single table search path can continue to have a single implementation.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] net/mlx4: Get rid of page operation after dma_alloc_coherent
  2018-12-18 17:45       ` Stephen Warren
@ 2018-12-18 18:43         ` Jason Gunthorpe
  2018-12-18 18:50           ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2018-12-18 18:43 UTC (permalink / raw)
  To: Stephen Warren, Christoph Hellwig
  Cc: Tariq Toukan, xavier.huwei, netdev, linux-rdma, Doug Ledford

On Tue, Dec 18, 2018 at 10:45:21AM -0700, Stephen Warren wrote:

> There are two allocation paths; one using dma_alloc_coherent and one using
> alloc_pages. (The hns driver only has the dma_alloc_coherent path.) These
> both store allocations into an sg list which is stored in a table, and that
> table is searched by a single function mlx4_table_find() irrespective of
> which allocation path was used, so if one of the allocation paths is updated
> to store the CPU virtual address differently, then both paths need to be
> updated so they match, so that the single table search path can continue to
> have a single implementation.

So the problem here is that on some arches

 sg_set_buf(sg, dma_coherent_buf, size)
 p = sg_virt(sg);
 assert(p == dma_coherent_buf);

Doesn't work or crashes? Is this how sgl should work?

But if you accept this and don't do the sg_set_buf then the
scatterlist is substantially broken, many of the APIs related to it
will not work as expected at all.

So, I don't think drivers should create such a broken scatterlist or
arches should not have this problem (ie the mathematical
transformation to struct page * and back to virtual address should
work in the coherent space even if there are no backing struct pages
allocated)?

What do you think CH?

Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] net/mlx4: Get rid of page operation after dma_alloc_coherent
  2018-12-18 18:43         ` Jason Gunthorpe
@ 2018-12-18 18:50           ` Christoph Hellwig
  2018-12-18 18:51             ` David Miller
  2018-12-18 19:04             ` Jason Gunthorpe
  0 siblings, 2 replies; 10+ messages in thread
From: Christoph Hellwig @ 2018-12-18 18:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Stephen Warren, Christoph Hellwig, Tariq Toukan, xavier.huwei,
	netdev, linux-rdma, Doug Ledford

On Tue, Dec 18, 2018 at 11:43:08AM -0700, Jason Gunthorpe wrote:
> So the problem here is that on some arches
> 
>  sg_set_buf(sg, dma_coherent_buf, size)
>  p = sg_virt(sg);
>  assert(p == dma_coherent_buf);

dma allocations purely return a virtual address, you must never
call virt_to_page or virt_to_phys on them, which sg_set_buf
will do.  On many architectures this will give your the wrong
result as the coherent DMA address is a vmap or ioremap address.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] net/mlx4: Get rid of page operation after dma_alloc_coherent
  2018-12-18 18:50           ` Christoph Hellwig
@ 2018-12-18 18:51             ` David Miller
  2018-12-18 19:04             ` Jason Gunthorpe
  1 sibling, 0 replies; 10+ messages in thread
From: David Miller @ 2018-12-18 18:51 UTC (permalink / raw)
  To: hch; +Cc: jgg, swarren, tariqt, xavier.huwei, netdev, linux-rdma, dledford

From: Christoph Hellwig <hch@lst.de>
Date: Tue, 18 Dec 2018 19:50:12 +0100

> On Tue, Dec 18, 2018 at 11:43:08AM -0700, Jason Gunthorpe wrote:
>> So the problem here is that on some arches
>> 
>>  sg_set_buf(sg, dma_coherent_buf, size)
>>  p = sg_virt(sg);
>>  assert(p == dma_coherent_buf);
> 
> dma allocations purely return a virtual address, you must never
> call virt_to_page or virt_to_phys on them, which sg_set_buf
> will do.  On many architectures this will give your the wrong
> result as the coherent DMA address is a vmap or ioremap address.

Correct.

And I don't think it makes sense to create fake pages for these
remapped areas to resolve to.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH] net/mlx4: Get rid of page operation after dma_alloc_coherent
  2018-12-18 18:50           ` Christoph Hellwig
  2018-12-18 18:51             ` David Miller
@ 2018-12-18 19:04             ` Jason Gunthorpe
  1 sibling, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2018-12-18 19:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Stephen Warren, Tariq Toukan, xavier.huwei, netdev, linux-rdma,
	Doug Ledford

On Tue, Dec 18, 2018 at 07:50:12PM +0100, Christoph Hellwig wrote:
> On Tue, Dec 18, 2018 at 11:43:08AM -0700, Jason Gunthorpe wrote:
> > So the problem here is that on some arches
> > 
> >  sg_set_buf(sg, dma_coherent_buf, size)
> >  p = sg_virt(sg);
> >  assert(p == dma_coherent_buf);
> 
> dma allocations purely return a virtual address, you must never
> call virt_to_page or virt_to_phys on them, which sg_set_buf
> will do.  On many architectures this will give your the wrong
> result as the coherent DMA address is a vmap or ioremap address.

Yes, that is what I gathered - if that is the design then I'd say that
drivers shouldn't be stuffing these DMA coherent virtual addresses
into a sg at all.

Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-12-18 19:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-14 23:32 [RFC PATCH] net/mlx4: Get rid of page operation after dma_alloc_coherent Stephen Warren
2018-12-18 13:40 ` Tariq Toukan
2018-12-18 16:32 ` Jason Gunthorpe
2018-12-18 17:08   ` Stephen Warren
2018-12-18 17:12     ` Jason Gunthorpe
2018-12-18 17:45       ` Stephen Warren
2018-12-18 18:43         ` Jason Gunthorpe
2018-12-18 18:50           ` Christoph Hellwig
2018-12-18 18:51             ` David Miller
2018-12-18 19:04             ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.