Linux-RDMA Archive on lore.kernel.org
 help / color / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: "Ursulin, Tvrtko" <tvrtko.ursulin@intel.com>,
	<tvrtko.ursulin@linux.intel.com>
Cc: Maor Gottlieb <maorg@nvidia.com>, Christoph Hellwig <hch@lst.de>,
	"Gal Pressman" <galpress@amazon.com>,
	Bob Pearson <rpearsonhpe@gmail.com>,
	"Leon Romanovsky" <leonro@nvidia.com>,
	<linux-rdma@vger.kernel.org>
Subject: Re: dynamic-sg patch has broken rdma_rxe
Date: Fri, 16 Oct 2020 08:58:31 -0300
Message-ID: <20201016115831.GI6219@nvidia.com> (raw)
In-Reply-To: <796ca31aed8f469c957cb850385b9d09@intel.com>

On Fri, Oct 16, 2020 at 08:29:11AM +0000, Ursulin, Tvrtko wrote:
> 
> Hi guys,
> 
> [I removed the mailing list from cc since from this email address) I
> can't reply properly inline. (tvrtko.ursulin@linux.intel.com works
> better.)]

I put it back
 
> However:
> 
> +	/* Avoid overflow when computing sg_len + PAGE_SIZE */
> +	max_segment = max_segment & PAGE_MASK;
> +	if (WARN_ON(max_segment < PAGE_SIZE))
>  		return ERR_PTR(-EINVAL);
> 
> Maybe it's too early for me but I don't get this. It appears the
> condition can only be true if the max_segment is smaller than page
> size as passed in to the function to _start with_. Don't see what
> does filtering out low bits achieves on top.

The entire problem is the algorithm in __sg_alloc_table_from_pages()
only limits sg_len to

   sg_len == N * PAGE_SIZE <= ALIGN_UP(max_segment, PAGE_SIZE);

ie it overshoots max_segment if it is unaligned.

It also badly malfunctions if the ALIGN_UP() overflows, eg for
ALIGN_UP(UINT_MAX).

This is all internal problems inside __sg_alloc_table_from_pages() and
has nothing to do with the scatter lists themselves.

Adding an ALIGN_DOWN guarentees this algorithm produces sg_len <=
max_segment in all cases.

> If the intent is to allow unaligned max_segment then also please
> change kerneldoc.

Sure
 
> Although TBH I don't get how unaligned max segment makes sense. List
> can end on an unaligned segment but surely shouldn't have then in
> the middle.

The max_segment should either be UINT_MAX because the caller doesn't
care, or come from the DMA max_segment_size which is a HW limitation
usually derived from the # of bits available to express a length.

Conflating the HW limitation with the system PAGE_SIZE is
nonsense. This is further confused because the only reason we have an
alignment restriction is due to this algorithm design, the SGL rules
don't prevent the use of unaligned lengths, or length smaller than
PAGE_SIZE, even in the interior.

Jason

From b03302028893ce7465ba7e8736abba1922469bc1 Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgg@nvidia.com>
Date: Fri, 16 Oct 2020 08:46:01 -0300
Subject: [PATCH] lib/scatterlist: Do not limit max_segment to PAGE_ALIGNED
 values

The main intention of the max_segment argument to
__sg_alloc_table_from_pages() is to match the DMA layer segment size set
by dma_set_max_seg_size().

Restricting the input to be page aligned makes it impossible to just
connect the DMA layer to this API.

The only reason for a page alignment here is because the algorithm will
overshoot the max_segment if it is not a multiple of PAGE_SIZE. Simply fix
the alignment before starting and don't expose this implementation detail
to the callers.

A future patch will completely remove SCATTERLIST_MAX_SEGMENT.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 lib/scatterlist.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index e102fdfaa75be7..ed2497c79a216b 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -404,7 +404,7 @@ static struct scatterlist *get_next_sg(struct sg_table *table,
  * @n_pages:	 Number of pages in the pages array
  * @offset:      Offset from start of the first page to the start of a buffer
  * @size:        Number of valid bytes in the buffer (after offset)
- * @max_segment: Maximum size of a scatterlist node in bytes (page aligned)
+ * @max_segment: Maximum size of a scatterlist element in bytes
  * @prv:	 Last populated sge in sgt
  * @left_pages:  Left pages caller have to set after this call
  * @gfp_mask:	 GFP allocation mask
@@ -435,7 +435,12 @@ struct scatterlist *__sg_alloc_table_from_pages(struct sg_table *sgt,
 	unsigned int added_nents = 0;
 	struct scatterlist *s = prv;
 
-	if (WARN_ON(!max_segment || offset_in_page(max_segment)))
+	/*
+	 * The algorithm below requires max_segment to be aligned to PAGE_SIZE
+	 * otherwise it can overshoot.
+	 */
+	max_segment = ALIGN_DOWN(max_segment, PAGE_SIZE);
+	if (WARN_ON(max_segment < PAGE_SIZE))
 		return ERR_PTR(-EINVAL);
 
 	if (IS_ENABLED(CONFIG_ARCH_NO_SG_CHAIN) && prv)
@@ -542,8 +547,7 @@ int sg_alloc_table_from_pages(struct sg_table *sgt, struct page **pages,
 			      unsigned long size, gfp_t gfp_mask)
 {
 	return PTR_ERR_OR_ZERO(__sg_alloc_table_from_pages(sgt, pages, n_pages,
-			offset, size, SCATTERLIST_MAX_SEGMENT,
-			NULL, 0, gfp_mask));
+			offset, size, UINT_MAX, NULL, 0, gfp_mask));
 }
 EXPORT_SYMBOL(sg_alloc_table_from_pages);
 
-- 
2.28.0


  parent reply index

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-13 14:33 Bob Pearson
2020-10-13 16:34 ` Bob Pearson
2020-10-14 22:51 ` Jason Gunthorpe
2020-10-15  7:44   ` Maor Gottlieb
2020-10-15 11:23     ` Gal Pressman
2020-10-15 12:21       ` Maor Gottlieb
2020-10-16  0:31         ` Jason Gunthorpe
2020-10-16  7:13           ` Christoph Hellwig
     [not found]           ` <796ca31aed8f469c957cb850385b9d09@intel.com>
2020-10-16 11:58             ` Jason Gunthorpe [this message]
2020-10-19  9:50               ` Tvrtko Ursulin
2020-10-19 12:12                 ` Jason Gunthorpe
2020-10-19 12:29                   ` Tvrtko Ursulin
2020-10-19 12:48                     ` Jason Gunthorpe
2020-10-20 11:37                       ` Tvrtko Ursulin
2020-10-20 11:47                         ` Jason Gunthorpe
2020-10-20 12:31                           ` Tvrtko Ursulin
2020-10-20 12:56                             ` Jason Gunthorpe
2020-10-20 13:09                               ` Tvrtko Ursulin
2020-10-20 13:32                                 ` Jason Gunthorpe
2020-10-15 15:35     ` Bob Pearson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201016115831.GI6219@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=galpress@amazon.com \
    --cc=hch@lst.de \
    --cc=leonro@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=maorg@nvidia.com \
    --cc=rpearsonhpe@gmail.com \
    --cc=tvrtko.ursulin@intel.com \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-RDMA Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rdma/0 linux-rdma/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rdma linux-rdma/ https://lore.kernel.org/linux-rdma \
		linux-rdma@vger.kernel.org
	public-inbox-index linux-rdma

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rdma


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git