linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/4] scatterlist: add new capabilities
@ 2021-01-18 16:30 Douglas Gilbert
  2021-01-18 16:30 ` [PATCH v6 1/4] sgl_alloc_order: remove 4 GiB limit, sgl_free() warning Douglas Gilbert
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Douglas Gilbert @ 2021-01-18 16:30 UTC (permalink / raw)
  To: linux-scsi, linux-block, target-devel, linux-rdma, linux-kernel
  Cc: martin.petersen, jejb, bostroesser, ddiss, bvanassche, jgg

Scatter-gather lists (sgl_s) are frequently used as data carriers in
the block layer. For example the SCSI and NVMe subsystems interchange
data with the block layer using sgl_s. The sgl API is declared in
<linux/scatterlist.h>

The author has extended these transient sgl use cases to a store (i.e.
a ramdisk) in the scsi_debug driver. Other new potential uses of sgl_s
could be for the target subsystem. When this extra step is taken, the
need to copy between sgl_s becomes apparent. The patchset adds
sgl_copy_sgl(), sgl_compare_sgl() and sgl_memset().

The existing sgl_alloc_order() function can be seen as a replacement
for vmalloc() for large, long-term allocations.  For what seems like
no good reason, sgl_alloc_order() currently restricts its total
allocation to less than or equal to 4 GiB. vmalloc() has no such
restriction.

Changes since v5 [posted 20201228]:
  - incorporate review requests from Jason Gunthorpe
  - replace integer overflow detection code in sgl_alloc_order()
    with a pre-condition statement
  - rebase on lk 5.11.0-rc4

Changes since v4 [posted 20201105]:
  - rebase on lk 5.10.0-rc2

Changes since v3 [posted 20201019]:
  - re-instate check on integer overflow of nent calculation in
    sgl_alloc_order(). Do it in such a way as to not limit the
    overall sgl size to 4  GiB
  - introduce sgl_compare_sgl_idx() helper function that, if
    requested and if a miscompare is detected, will yield the byte
    index of the first miscompare.
  - add Reviewed-by tags from Bodo Stroesser
  - rebase on lk 5.10.0-rc2 [was on lk 5.9.0]

Changes since v2 [posted 20201018]:
  - remove unneeded lines from sgl_memset() definition.
  - change sg_zero_buffer() to call sgl_memset() as the former
    is a subset.

Changes since v1 [posted 20201016]:
  - Bodo Stroesser pointed out a problem with the nesting of
    kmap_atomic() [called via sg_miter_next()] and kunmap_atomic()
    calls [called via sg_miter_stop()] and proposed a solution that
    simplifies the previous code.

  - the new implementation of the three functions has shorter periods
    when pre-emption is disabled (but has more them). This should
    make operations on large sgl_s more pre-emption "friendly" with
    a relatively small performance hit.

  - sgl_memset return type changed from void to size_t and is the
    number of bytes actually (over)written. That number is needed
    anyway internally so may as well return it as it may be useful to
    the caller.

This patchset is against lk 5.11.0-rc4

Douglas Gilbert (4):
  sgl_alloc_order: remove 4 GiB limit, sgl_free() warning
  scatterlist: add sgl_copy_sgl() function
  scatterlist: add sgl_compare_sgl() function
  scatterlist: add sgl_memset()

 include/linux/scatterlist.h |  33 ++++-
 lib/scatterlist.c           | 253 +++++++++++++++++++++++++++++++-----
 2 files changed, 253 insertions(+), 33 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v6 1/4] sgl_alloc_order: remove 4 GiB limit, sgl_free() warning
  2021-01-18 16:30 [PATCH v6 0/4] scatterlist: add new capabilities Douglas Gilbert
@ 2021-01-18 16:30 ` Douglas Gilbert
  2021-01-18 18:28   ` Jason Gunthorpe
  2021-01-18 16:30 ` [PATCH v6 2/4] scatterlist: add sgl_copy_sgl() function Douglas Gilbert
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 21+ messages in thread
From: Douglas Gilbert @ 2021-01-18 16:30 UTC (permalink / raw)
  To: linux-scsi, linux-block, target-devel, linux-rdma, linux-kernel
  Cc: martin.petersen, jejb, bostroesser, ddiss, bvanassche, jgg

This patch fixes a check done by sgl_alloc_order() before it starts
any allocations. The comment in the original said: "Check for integer
overflow" but the check itself contained an integer overflow! The
right hand side (rhs) of the expression in the condition is resolved
as u32 so it could not exceed UINT32_MAX (4 GiB) which means 'length'
could not exceed that value. If that was the intention then the
comment above it could be dropped and the condition rewritten more
clearly as:
     if (length > UINT32_MAX) <<failure path >>;

After several flawed attempts to detect overflow, take the fastest
route by stating as a pre-condition that the 'order' function argument
cannot exceed 16 (2^16 * 4k = 256 MiB).

This function may be used to replace vmalloc(unsigned long) for a
large allocation (e.g. a ramdisk). vmalloc has no limit at 4 GiB so
it seems unreasonable that:
    sgl_alloc_order(unsigned long long length, ....)
does. sgl_s made with sgl_alloc_order() have equally sized segments
placed in a scatter gather array. That allows O(1) navigation around
a big sgl using some simple integer arithmetic.

Revise some of this function's description to more accurately reflect
what this function is doing.

An earlier patch fixed a memory leak in sg_alloc_order() due to the
misuse of sgl_free(). Take the opportunity to put a one line comment
above sgl_free()'s declaration warning that it is not suitable when
order > 0 .

Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 include/linux/scatterlist.h |  1 +
 lib/scatterlist.c           | 21 ++++++++++-----------
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 6f70572b2938..8adff41f7cfa 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -302,6 +302,7 @@ struct scatterlist *sgl_alloc(unsigned long long length, gfp_t gfp,
 			      unsigned int *nent_p);
 void sgl_free_n_order(struct scatterlist *sgl, int nents, int order);
 void sgl_free_order(struct scatterlist *sgl, int order);
+/* Only use sgl_free() when order is 0 */
 void sgl_free(struct scatterlist *sgl);
 #endif /* CONFIG_SGL_ALLOC */
 
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index a59778946404..24ea2d31a405 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -554,13 +554,16 @@ EXPORT_SYMBOL(sg_alloc_table_from_pages);
 #ifdef CONFIG_SGL_ALLOC
 
 /**
- * sgl_alloc_order - allocate a scatterlist and its pages
+ * sgl_alloc_order - allocate a scatterlist with equally sized elements each
+ *		     of which has 2^@order continuous pages
  * @length: Length in bytes of the scatterlist. Must be at least one
- * @order: Second argument for alloc_pages()
+ * @order: Second argument for alloc_pages(). Each sgl element size will
+ *	   be (PAGE_SIZE*2^@order) bytes. @order must not exceed 16.
  * @chainable: Whether or not to allocate an extra element in the scatterlist
- *	for scatterlist chaining purposes
+ *	       for scatterlist chaining purposes
  * @gfp: Memory allocation flags
- * @nent_p: [out] Number of entries in the scatterlist that have pages
+ * @nent_p: [out] Number of entries in the scatterlist that have pages.
+ *		  Ignored if NULL is given.
  *
  * Returns: A pointer to an initialized scatterlist or %NULL upon failure.
  */
@@ -574,15 +577,11 @@ struct scatterlist *sgl_alloc_order(unsigned long long length,
 	u32 elem_len;
 
 	nent = round_up(length, PAGE_SIZE << order) >> (PAGE_SHIFT + order);
-	/* Check for integer overflow */
-	if (length > (nent << (PAGE_SHIFT + order)))
-		return NULL;
-	nalloc = nent;
 	if (chainable) {
-		/* Check for integer overflow */
-		if (nalloc + 1 < nalloc)
+		if (check_add_overflow(nent, 1U, &nalloc))
 			return NULL;
-		nalloc++;
+	} else {
+		nalloc = nent;
 	}
 	sgl = kmalloc_array(nalloc, sizeof(struct scatterlist),
 			    gfp & ~GFP_DMA);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v6 2/4] scatterlist: add sgl_copy_sgl() function
  2021-01-18 16:30 [PATCH v6 0/4] scatterlist: add new capabilities Douglas Gilbert
  2021-01-18 16:30 ` [PATCH v6 1/4] sgl_alloc_order: remove 4 GiB limit, sgl_free() warning Douglas Gilbert
@ 2021-01-18 16:30 ` Douglas Gilbert
  2021-01-18 16:30 ` [PATCH v6 3/4] scatterlist: add sgl_compare_sgl() function Douglas Gilbert
  2021-01-18 16:30 ` [PATCH v6 4/4] scatterlist: add sgl_memset() Douglas Gilbert
  3 siblings, 0 replies; 21+ messages in thread
From: Douglas Gilbert @ 2021-01-18 16:30 UTC (permalink / raw)
  To: linux-scsi, linux-block, target-devel, linux-rdma, linux-kernel
  Cc: martin.petersen, jejb, bostroesser, ddiss, bvanassche, jgg

Both the SCSI and NVMe subsystems receive user data from the block
layer in scatterlist_s (aka scatter gather lists (sgl) which are
often arrays). If drivers in those subsystems represent storage
(e.g. a ramdisk) or cache "hot" user data then they may also
choose to use scatterlist_s. Currently there are no sgl to sgl
operations in the kernel. Start with a sgl to sgl copy. Stops
when the first of the number of requested bytes to copy, or the
source sgl, or the destination sgl is exhausted. So the
destination sgl will _not_ grow.

Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 include/linux/scatterlist.h |  4 ++
 lib/scatterlist.c           | 74 +++++++++++++++++++++++++++++++++++++
 2 files changed, 78 insertions(+)

diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 8adff41f7cfa..3f836a3246aa 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -321,6 +321,10 @@ size_t sg_pcopy_to_buffer(struct scatterlist *sgl, unsigned int nents,
 size_t sg_zero_buffer(struct scatterlist *sgl, unsigned int nents,
 		       size_t buflen, off_t skip);
 
+size_t sgl_copy_sgl(struct scatterlist *d_sgl, unsigned int d_nents, off_t d_skip,
+		    struct scatterlist *s_sgl, unsigned int s_nents, off_t s_skip,
+		    size_t n_bytes);
+
 /*
  * Maximum number of entries that will be allocated in one piece, if
  * a list larger than this is required then chaining will be utilized.
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index 24ea2d31a405..c06f8caaff91 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -1057,3 +1057,77 @@ size_t sg_zero_buffer(struct scatterlist *sgl, unsigned int nents,
 	return offset;
 }
 EXPORT_SYMBOL(sg_zero_buffer);
+
+/**
+ * sgl_copy_sgl - Copy over a destination sgl from a source sgl
+ * @d_sgl:		 Destination sgl
+ * @d_nents:		 Number of SG entries in destination sgl
+ * @d_skip:		 Number of bytes to skip in destination before starting
+ * @s_sgl:		 Source sgl
+ * @s_nents:		 Number of SG entries in source sgl
+ * @s_skip:		 Number of bytes to skip in source before starting
+ * @n_bytes:		 The (maximum) number of bytes to copy
+ *
+ * Returns:
+ *   The number of copied bytes.
+ *
+ * Notes:
+ *   Destination arguments appear before the source arguments, as with memcpy().
+ *
+ *   Stops copying if either d_sgl, s_sgl or n_bytes is exhausted.
+ *
+ *   Since memcpy() is used, overlapping copies (where d_sgl and s_sgl belong
+ *   to the same sgl and the copy regions overlap) are not supported.
+ *
+ *   Large copies are broken into copy segments whose sizes may vary. Those
+ *   copy segment sizes are chosen by the min3() statement in the code below.
+ *   Since SG_MITER_ATOMIC is used for both sides, each copy segment is started
+ *   with kmap_atomic() [in sg_miter_next()] and completed with kunmap_atomic()
+ *   [in sg_miter_stop()]. This means pre-emption is inhibited for relatively
+ *   short periods even in very large copies.
+ *
+ *   If d_skip is large, potentially spanning multiple d_nents then some
+ *   integer arithmetic to adjust d_sgl may improve performance. For example
+ *   if d_sgl is built using sgl_alloc_order(chainable=false) then the sgl
+ *   will be an array with equally sized segments facilitating that
+ *   arithmetic. The suggestion applies to s_skip, s_sgl and s_nents as well.
+ *
+ **/
+size_t sgl_copy_sgl(struct scatterlist *d_sgl, unsigned int d_nents, off_t d_skip,
+		    struct scatterlist *s_sgl, unsigned int s_nents, off_t s_skip,
+		    size_t n_bytes)
+{
+	size_t len;
+	size_t offset = 0;
+	struct sg_mapping_iter d_iter, s_iter;
+
+	if (n_bytes == 0)
+		return 0;
+	sg_miter_start(&s_iter, s_sgl, s_nents, SG_MITER_ATOMIC | SG_MITER_FROM_SG);
+	sg_miter_start(&d_iter, d_sgl, d_nents, SG_MITER_ATOMIC | SG_MITER_TO_SG);
+	if (!sg_miter_skip(&s_iter, s_skip))
+		goto fini;
+	if (!sg_miter_skip(&d_iter, d_skip))
+		goto fini;
+
+	while (offset < n_bytes) {
+		if (!sg_miter_next(&s_iter))
+			break;
+		if (!sg_miter_next(&d_iter))
+			break;
+		len = min3(d_iter.length, s_iter.length, n_bytes - offset);
+
+		memcpy(d_iter.addr, s_iter.addr, len);
+		offset += len;
+		/* LIFO order (stop d_iter before s_iter) needed with SG_MITER_ATOMIC */
+		d_iter.consumed = len;
+		sg_miter_stop(&d_iter);
+		s_iter.consumed = len;
+		sg_miter_stop(&s_iter);
+	}
+fini:
+	sg_miter_stop(&d_iter);
+	sg_miter_stop(&s_iter);
+	return offset;
+}
+EXPORT_SYMBOL(sgl_copy_sgl);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v6 3/4] scatterlist: add sgl_compare_sgl() function
  2021-01-18 16:30 [PATCH v6 0/4] scatterlist: add new capabilities Douglas Gilbert
  2021-01-18 16:30 ` [PATCH v6 1/4] sgl_alloc_order: remove 4 GiB limit, sgl_free() warning Douglas Gilbert
  2021-01-18 16:30 ` [PATCH v6 2/4] scatterlist: add sgl_copy_sgl() function Douglas Gilbert
@ 2021-01-18 16:30 ` Douglas Gilbert
  2021-01-18 23:27   ` David Disseldorp
  2021-01-18 16:30 ` [PATCH v6 4/4] scatterlist: add sgl_memset() Douglas Gilbert
  3 siblings, 1 reply; 21+ messages in thread
From: Douglas Gilbert @ 2021-01-18 16:30 UTC (permalink / raw)
  To: linux-scsi, linux-block, target-devel, linux-rdma, linux-kernel
  Cc: martin.petersen, jejb, bostroesser, ddiss, bvanassche, jgg

After enabling copies between scatter gather lists (sgl_s), another
storage related operation is to compare two sgl_s. This new function
is modelled on NVMe's Compare command and the SCSI VERIFY(BYTCHK=1)
command. Like memcmp() this function returns false on the first
miscompare and stops comparing.

A helper function called sgl_compare_sgl_idx() is added. It takes an
additional parameter (miscompare_idx) which is a pointer. If that
pointer is non-NULL and a miscompare is detected (i.e. the function
returns false) then the byte index of the first miscompare is written
to *miscomapre_idx. Knowing the location of the first miscompare is
needed to implement the SCSI COMPARE AND WRITE command properly.

Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 include/linux/scatterlist.h |   8 +++
 lib/scatterlist.c           | 109 ++++++++++++++++++++++++++++++++++++
 2 files changed, 117 insertions(+)

diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 3f836a3246aa..71be65f9ebb5 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -325,6 +325,14 @@ size_t sgl_copy_sgl(struct scatterlist *d_sgl, unsigned int d_nents, off_t d_ski
 		    struct scatterlist *s_sgl, unsigned int s_nents, off_t s_skip,
 		    size_t n_bytes);
 
+bool sgl_compare_sgl(struct scatterlist *x_sgl, unsigned int x_nents, off_t x_skip,
+		     struct scatterlist *y_sgl, unsigned int y_nents, off_t y_skip,
+		     size_t n_bytes);
+
+bool sgl_compare_sgl_idx(struct scatterlist *x_sgl, unsigned int x_nents, off_t x_skip,
+			 struct scatterlist *y_sgl, unsigned int y_nents, off_t y_skip,
+			 size_t n_bytes, size_t *miscompare_idx);
+
 /*
  * Maximum number of entries that will be allocated in one piece, if
  * a list larger than this is required then chaining will be utilized.
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index c06f8caaff91..e3182de753d0 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -1131,3 +1131,112 @@ size_t sgl_copy_sgl(struct scatterlist *d_sgl, unsigned int d_nents, off_t d_ski
 	return offset;
 }
 EXPORT_SYMBOL(sgl_copy_sgl);
+
+/**
+ * sgl_compare_sgl_idx - Compare x and y (both sgl_s)
+ * @x_sgl:		 x (left) sgl
+ * @x_nents:		 Number of SG entries in x (left) sgl
+ * @x_skip:		 Number of bytes to skip in x (left) before starting
+ * @y_sgl:		 y (right) sgl
+ * @y_nents:		 Number of SG entries in y (right) sgl
+ * @y_skip:		 Number of bytes to skip in y (right) before starting
+ * @n_bytes:		 The (maximum) number of bytes to compare
+ * @miscompare_idx:	 if return is false, index of first miscompare written
+ *			 to this pointer (if non-NULL). Value will be < n_bytes
+ *
+ * Returns:
+ *   true if x and y compare equal before x, y or n_bytes is exhausted.
+ *   Otherwise on a miscompare, returns false (and stops comparing). If return
+ *   is false and miscompare_idx is non-NULL, then index of first miscompared
+ *   byte written to *miscompare_idx.
+ *
+ * Notes:
+ *   x and y are symmetrical: they can be swapped and the result is the same.
+ *
+ *   Implementation is based on memcmp(). x and y segments may overlap.
+ *
+ *   The notes in sgl_copy_sgl() about large sgl_s _applies here as well.
+ *
+ **/
+bool sgl_compare_sgl_idx(struct scatterlist *x_sgl, unsigned int x_nents, off_t x_skip,
+			 struct scatterlist *y_sgl, unsigned int y_nents, off_t y_skip,
+			 size_t n_bytes, size_t *miscompare_idx)
+{
+	bool equ = true;
+	size_t len;
+	size_t offset = 0;
+	struct sg_mapping_iter x_iter, y_iter;
+
+	if (n_bytes == 0)
+		return true;
+	sg_miter_start(&x_iter, x_sgl, x_nents, SG_MITER_ATOMIC | SG_MITER_FROM_SG);
+	sg_miter_start(&y_iter, y_sgl, y_nents, SG_MITER_ATOMIC | SG_MITER_FROM_SG);
+	if (!sg_miter_skip(&x_iter, x_skip))
+		goto fini;
+	if (!sg_miter_skip(&y_iter, y_skip))
+		goto fini;
+
+	while (offset < n_bytes) {
+		if (!sg_miter_next(&x_iter))
+			break;
+		if (!sg_miter_next(&y_iter))
+			break;
+		len = min3(x_iter.length, y_iter.length, n_bytes - offset);
+
+		equ = !memcmp(x_iter.addr, y_iter.addr, len);
+		if (!equ)
+			goto fini;
+		offset += len;
+		/* LIFO order is important when SG_MITER_ATOMIC is used */
+		y_iter.consumed = len;
+		sg_miter_stop(&y_iter);
+		x_iter.consumed = len;
+		sg_miter_stop(&x_iter);
+	}
+fini:
+	if (miscompare_idx && !equ) {
+		u8 *xp = x_iter.addr;
+		u8 *yp = y_iter.addr;
+		u8 *x_endp;
+
+		for (x_endp = xp + len ; xp < x_endp; ++xp, ++yp) {
+			if (*xp != *yp)
+				break;
+		}
+		*miscompare_idx = offset + len - (x_endp - xp);
+	}
+	sg_miter_stop(&y_iter);
+	sg_miter_stop(&x_iter);
+	return equ;
+}
+EXPORT_SYMBOL(sgl_compare_sgl_idx);
+
+/**
+ * sgl_compare_sgl - Compare x and y (both sgl_s)
+ * @x_sgl:		 x (left) sgl
+ * @x_nents:		 Number of SG entries in x (left) sgl
+ * @x_skip:		 Number of bytes to skip in x (left) before starting
+ * @y_sgl:		 y (right) sgl
+ * @y_nents:		 Number of SG entries in y (right) sgl
+ * @y_skip:		 Number of bytes to skip in y (right) before starting
+ * @n_bytes:		 The (maximum) number of bytes to compare
+ *
+ * Returns:
+ *   true if x and y compare equal before x, y or n_bytes is exhausted.
+ *   Otherwise on a miscompare, returns false (and stops comparing).
+ *
+ * Notes:
+ *   x and y are symmetrical: they can be swapped and the result is the same.
+ *
+ *   Implementation is based on memcmp(). x and y segments may overlap.
+ *
+ *   The notes in sgl_copy_sgl() about large sgl_s _applies here as well.
+ *
+ **/
+bool sgl_compare_sgl(struct scatterlist *x_sgl, unsigned int x_nents, off_t x_skip,
+		     struct scatterlist *y_sgl, unsigned int y_nents, off_t y_skip,
+		     size_t n_bytes)
+{
+	return sgl_compare_sgl_idx(x_sgl, x_nents, x_skip, y_sgl, y_nents, y_skip, n_bytes, NULL);
+}
+EXPORT_SYMBOL(sgl_compare_sgl);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v6 4/4] scatterlist: add sgl_memset()
  2021-01-18 16:30 [PATCH v6 0/4] scatterlist: add new capabilities Douglas Gilbert
                   ` (2 preceding siblings ...)
  2021-01-18 16:30 ` [PATCH v6 3/4] scatterlist: add sgl_compare_sgl() function Douglas Gilbert
@ 2021-01-18 16:30 ` Douglas Gilbert
  3 siblings, 0 replies; 21+ messages in thread
From: Douglas Gilbert @ 2021-01-18 16:30 UTC (permalink / raw)
  To: linux-scsi, linux-block, target-devel, linux-rdma, linux-kernel
  Cc: martin.petersen, jejb, bostroesser, ddiss, bvanassche, jgg

The existing sg_zero_buffer() function is a bit restrictive. For
example protection information (PI) blocks are usually initialized
to 0xff bytes. As its name suggests sgl_memset() is modelled on
memset(). One difference is the type of the val argument which is
u8 rather than int. Plus it returns the number of bytes (over)written.

Change implementation of sg_zero_buffer() to call this new function.

Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 include/linux/scatterlist.h | 20 +++++++++-
 lib/scatterlist.c           | 79 +++++++++++++++++++++----------------
 2 files changed, 62 insertions(+), 37 deletions(-)

diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 71be65f9ebb5..69e87280b44d 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -318,8 +318,6 @@ size_t sg_pcopy_from_buffer(struct scatterlist *sgl, unsigned int nents,
 			    const void *buf, size_t buflen, off_t skip);
 size_t sg_pcopy_to_buffer(struct scatterlist *sgl, unsigned int nents,
 			  void *buf, size_t buflen, off_t skip);
-size_t sg_zero_buffer(struct scatterlist *sgl, unsigned int nents,
-		       size_t buflen, off_t skip);
 
 size_t sgl_copy_sgl(struct scatterlist *d_sgl, unsigned int d_nents, off_t d_skip,
 		    struct scatterlist *s_sgl, unsigned int s_nents, off_t s_skip,
@@ -333,6 +331,24 @@ bool sgl_compare_sgl_idx(struct scatterlist *x_sgl, unsigned int x_nents, off_t
 			 struct scatterlist *y_sgl, unsigned int y_nents, off_t y_skip,
 			 size_t n_bytes, size_t *miscompare_idx);
 
+size_t sgl_memset(struct scatterlist *sgl, unsigned int nents, off_t skip,
+		  u8 val, size_t n_bytes);
+
+/**
+ * sg_zero_buffer - Zero-out a part of a SG list
+ * @sgl:		The SG list
+ * @nents:		Number of SG entries
+ * @buflen:		The number of bytes to zero out
+ * @skip:		Number of bytes to skip before zeroing
+ *
+ * Returns the number of bytes zeroed.
+ **/
+static inline size_t sg_zero_buffer(struct scatterlist *sgl, unsigned int nents,
+				    size_t buflen, off_t skip)
+{
+	return sgl_memset(sgl, nents, skip, 0, buflen);
+}
+
 /*
  * Maximum number of entries that will be allocated in one piece, if
  * a list larger than this is required then chaining will be utilized.
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index e3182de753d0..7e6acc67e9f6 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -1023,41 +1023,6 @@ size_t sg_pcopy_to_buffer(struct scatterlist *sgl, unsigned int nents,
 }
 EXPORT_SYMBOL(sg_pcopy_to_buffer);
 
-/**
- * sg_zero_buffer - Zero-out a part of a SG list
- * @sgl:		 The SG list
- * @nents:		 Number of SG entries
- * @buflen:		 The number of bytes to zero out
- * @skip:		 Number of bytes to skip before zeroing
- *
- * Returns the number of bytes zeroed.
- **/
-size_t sg_zero_buffer(struct scatterlist *sgl, unsigned int nents,
-		       size_t buflen, off_t skip)
-{
-	unsigned int offset = 0;
-	struct sg_mapping_iter miter;
-	unsigned int sg_flags = SG_MITER_ATOMIC | SG_MITER_TO_SG;
-
-	sg_miter_start(&miter, sgl, nents, sg_flags);
-
-	if (!sg_miter_skip(&miter, skip))
-		return false;
-
-	while (offset < buflen && sg_miter_next(&miter)) {
-		unsigned int len;
-
-		len = min(miter.length, buflen - offset);
-		memset(miter.addr, 0, len);
-
-		offset += len;
-	}
-
-	sg_miter_stop(&miter);
-	return offset;
-}
-EXPORT_SYMBOL(sg_zero_buffer);
-
 /**
  * sgl_copy_sgl - Copy over a destination sgl from a source sgl
  * @d_sgl:		 Destination sgl
@@ -1240,3 +1205,47 @@ bool sgl_compare_sgl(struct scatterlist *x_sgl, unsigned int x_nents, off_t x_sk
 	return sgl_compare_sgl_idx(x_sgl, x_nents, x_skip, y_sgl, y_nents, y_skip, n_bytes, NULL);
 }
 EXPORT_SYMBOL(sgl_compare_sgl);
+
+/**
+ * sgl_memset - set byte 'val' up to n_bytes times on SG list
+ * @sgl:		 The SG list
+ * @nents:		 Number of SG entries in sgl
+ * @skip:		 Number of bytes to skip before starting
+ * @val:		 byte value to write to sgl
+ * @n_bytes:		 The (maximum) number of bytes to modify
+ *
+ * Returns:
+ *   The number of bytes written.
+ *
+ * Notes:
+ *   Stops writing if either sgl or n_bytes is exhausted. If n_bytes is
+ *   set SIZE_MAX then val will be written to each byte until the end
+ *   of sgl.
+ *
+ *   The notes in sgl_copy_sgl() about large sgl_s _applies here as well.
+ *
+ **/
+size_t sgl_memset(struct scatterlist *sgl, unsigned int nents, off_t skip,
+		  u8 val, size_t n_bytes)
+{
+	size_t offset = 0;
+	size_t len;
+	struct sg_mapping_iter miter;
+
+	if (n_bytes == 0)
+		return 0;
+	sg_miter_start(&miter, sgl, nents, SG_MITER_ATOMIC | SG_MITER_TO_SG);
+	if (!sg_miter_skip(&miter, skip))
+		goto fini;
+
+	while ((offset < n_bytes) && sg_miter_next(&miter)) {
+		len = min(miter.length, n_bytes - offset);
+		memset(miter.addr, val, len);
+		offset += len;
+	}
+fini:
+	sg_miter_stop(&miter);
+	return offset;
+}
+EXPORT_SYMBOL(sgl_memset);
+
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 1/4] sgl_alloc_order: remove 4 GiB limit, sgl_free() warning
  2021-01-18 16:30 ` [PATCH v6 1/4] sgl_alloc_order: remove 4 GiB limit, sgl_free() warning Douglas Gilbert
@ 2021-01-18 18:28   ` Jason Gunthorpe
  2021-01-18 20:08     ` Douglas Gilbert
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2021-01-18 18:28 UTC (permalink / raw)
  To: Douglas Gilbert
  Cc: linux-scsi, linux-block, target-devel, linux-rdma, linux-kernel,
	martin.petersen, jejb, bostroesser, ddiss, bvanassche

On Mon, Jan 18, 2021 at 11:30:03AM -0500, Douglas Gilbert wrote:

> After several flawed attempts to detect overflow, take the fastest
> route by stating as a pre-condition that the 'order' function argument
> cannot exceed 16 (2^16 * 4k = 256 MiB).

That doesn't help, the point of the overflow check is similar to
overflow checks in kcalloc: to prevent the routine from allocating
less memory than the caller might assume.

For instance ipr_store_update_fw() uses request_firmware() (which is
controlled by userspace) to drive the length argument to
sgl_alloc_order(). If userpace gives too large a value this will
corrupt kernel memory.

So this math:

  	nent = round_up(length, PAGE_SIZE << order) >> (PAGE_SHIFT + order);

Needs to be checked, add a precondition to order does not help. I
already proposed a straightforward algorithm you can use.

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 1/4] sgl_alloc_order: remove 4 GiB limit, sgl_free() warning
  2021-01-18 18:28   ` Jason Gunthorpe
@ 2021-01-18 20:08     ` Douglas Gilbert
  2021-01-18 20:24       ` Jason Gunthorpe
  2021-01-18 20:46       ` Bodo Stroesser
  0 siblings, 2 replies; 21+ messages in thread
From: Douglas Gilbert @ 2021-01-18 20:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-scsi, linux-block, target-devel, linux-rdma, linux-kernel,
	martin.petersen, jejb, bostroesser, ddiss, bvanassche

On 2021-01-18 1:28 p.m., Jason Gunthorpe wrote:
> On Mon, Jan 18, 2021 at 11:30:03AM -0500, Douglas Gilbert wrote:
> 
>> After several flawed attempts to detect overflow, take the fastest
>> route by stating as a pre-condition that the 'order' function argument
>> cannot exceed 16 (2^16 * 4k = 256 MiB).
> 
> That doesn't help, the point of the overflow check is similar to
> overflow checks in kcalloc: to prevent the routine from allocating
> less memory than the caller might assume.
> 
> For instance ipr_store_update_fw() uses request_firmware() (which is
> controlled by userspace) to drive the length argument to
> sgl_alloc_order(). If userpace gives too large a value this will
> corrupt kernel memory.
> 
> So this math:
> 
>    	nent = round_up(length, PAGE_SIZE << order) >> (PAGE_SHIFT + order);

But that check itself overflows if order is too large (e.g. 65).
A pre-condition says that the caller must know or check a value
is sane, and if the user space can have a hand in the value passed
the caller _must_ check pre-conditions IMO. A pre-condition also
implies that the function's implementation will not have code to
check the pre-condition.

My "log of both sides" proposal at least got around the overflowing
left shift problem. And one reviewer, Bodo Stroesser, liked it.

> Needs to be checked, add a precondition to order does not help. I
> already proposed a straightforward algorithm you can use.

It does help, it stops your proposed check from being flawed :-)

Giving a false sense of security seems more dangerous than a
pre-condition statement IMO. Bart's original overflow check (in
the mainline) limits length to 4GB (due to wrapping inside a 32
bit unsigned).

Also note there is another pre-condition statement in that function's
definition, namely that length cannot be 0.

So perhaps you, Bart Van Assche and Bodo Stroesser, should compare
notes and come up with a solution that you are _all_ happy with.
The pre-condition works for me and is the fastest. The 'length'
argument might be large, say > 1 GB [I use 1 GB in testing but
did try 4GB and found the bug I'm trying to fix] but having
individual elements greater than say 32 MB each does not
seem very practical (and fails on the systems that I test with).
In my testing the largest element size is 4 MB.


Doug Gilbert


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 1/4] sgl_alloc_order: remove 4 GiB limit, sgl_free() warning
  2021-01-18 20:08     ` Douglas Gilbert
@ 2021-01-18 20:24       ` Jason Gunthorpe
  2021-01-18 21:22         ` Bodo Stroesser
  2021-01-18 20:46       ` Bodo Stroesser
  1 sibling, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2021-01-18 20:24 UTC (permalink / raw)
  To: Douglas Gilbert
  Cc: linux-scsi, linux-block, target-devel, linux-rdma, linux-kernel,
	martin.petersen, jejb, bostroesser, ddiss, bvanassche

On Mon, Jan 18, 2021 at 03:08:51PM -0500, Douglas Gilbert wrote:
> On 2021-01-18 1:28 p.m., Jason Gunthorpe wrote:
> > On Mon, Jan 18, 2021 at 11:30:03AM -0500, Douglas Gilbert wrote:
> > 
> > > After several flawed attempts to detect overflow, take the fastest
> > > route by stating as a pre-condition that the 'order' function argument
> > > cannot exceed 16 (2^16 * 4k = 256 MiB).
> > 
> > That doesn't help, the point of the overflow check is similar to
> > overflow checks in kcalloc: to prevent the routine from allocating
> > less memory than the caller might assume.
> > 
> > For instance ipr_store_update_fw() uses request_firmware() (which is
> > controlled by userspace) to drive the length argument to
> > sgl_alloc_order(). If userpace gives too large a value this will
> > corrupt kernel memory.
> > 
> > So this math:
> > 
> >    	nent = round_up(length, PAGE_SIZE << order) >> (PAGE_SHIFT + order);
> 
> But that check itself overflows if order is too large (e.g. 65).

I don't reall care about order. It is always controlled by the kernel
and it is fine to just require it be low enough to not
overflow. length is the data under userspace control so math on it
must be checked for overflow.

> Also note there is another pre-condition statement in that function's
> definition, namely that length cannot be 0.

I don't see callers checking for that either, if it is true length 0
can't be allowed it should be blocked in the function

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 1/4] sgl_alloc_order: remove 4 GiB limit, sgl_free() warning
  2021-01-18 20:08     ` Douglas Gilbert
  2021-01-18 20:24       ` Jason Gunthorpe
@ 2021-01-18 20:46       ` Bodo Stroesser
  1 sibling, 0 replies; 21+ messages in thread
From: Bodo Stroesser @ 2021-01-18 20:46 UTC (permalink / raw)
  To: dgilbert, Jason Gunthorpe
  Cc: linux-scsi, linux-block, target-devel, linux-rdma, linux-kernel,
	martin.petersen, jejb, ddiss, bvanassche

On 18.01.21 21:08, Douglas Gilbert wrote:
> On 2021-01-18 1:28 p.m., Jason Gunthorpe wrote:
>> On Mon, Jan 18, 2021 at 11:30:03AM -0500, Douglas Gilbert wrote:
>>
>>> After several flawed attempts to detect overflow, take the fastest
>>> route by stating as a pre-condition that the 'order' function argument
>>> cannot exceed 16 (2^16 * 4k = 256 MiB).
>>
>> That doesn't help, the point of the overflow check is similar to
>> overflow checks in kcalloc: to prevent the routine from allocating
>> less memory than the caller might assume.
>>
>> For instance ipr_store_update_fw() uses request_firmware() (which is
>> controlled by userspace) to drive the length argument to
>> sgl_alloc_order(). If userpace gives too large a value this will
>> corrupt kernel memory.
>>
>> So this math:
>>
>>        nent = round_up(length, PAGE_SIZE << order) >> (PAGE_SHIFT + 
>> order);
> 
> But that check itself overflows if order is too large (e.g. 65).
> A pre-condition says that the caller must know or check a value
> is sane, and if the user space can have a hand in the value passed
> the caller _must_ check pre-conditions IMO. A pre-condition also
> implies that the function's implementation will not have code to
> check the pre-condition.
> 
> My "log of both sides" proposal at least got around the overflowing
> left shift problem. And one reviewer, Bodo Stroesser, liked it.

I added my Reviewed-by after you added a working check of nent overflow.
I did not oppose to the usage of ilog() there. But now I think Jason is
right that indeed ilog usage is a bit 'indirect'.

Anyway I still think, there should be a check for nent overflow.

> 
>> Needs to be checked, add a precondition to order does not help. I
>> already proposed a straightforward algorithm you can use.
> 
> It does help, it stops your proposed check from being flawed :-)
> 
> Giving a false sense of security seems more dangerous than a
> pre-condition statement IMO. Bart's original overflow check (in
> the mainline) limits length to 4GB (due to wrapping inside a 32
> bit unsigned).
> 
> Also note there is another pre-condition statement in that function's
> definition, namely that length cannot be 0.
> 
> So perhaps you, Bart Van Assche and Bodo Stroesser, should compare
> notes and come up with a solution that you are _all_ happy with.
> The pre-condition works for me and is the fastest. The 'length'
> argument might be large, say > 1 GB [I use 1 GB in testing but
> did try 4GB and found the bug I'm trying to fix] but having
> individual elements greater than say 32 MB each does not
> seem very practical (and fails on the systems that I test with).
> In my testing the largest element size is 4 MB.
> 
> 
> Doug Gilbert
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 1/4] sgl_alloc_order: remove 4 GiB limit, sgl_free() warning
  2021-01-18 20:24       ` Jason Gunthorpe
@ 2021-01-18 21:22         ` Bodo Stroesser
  2021-01-18 23:48           ` Jason Gunthorpe
  0 siblings, 1 reply; 21+ messages in thread
From: Bodo Stroesser @ 2021-01-18 21:22 UTC (permalink / raw)
  To: Jason Gunthorpe, Douglas Gilbert
  Cc: linux-scsi, linux-block, target-devel, linux-rdma, linux-kernel,
	martin.petersen, jejb, ddiss, bvanassche

On 18.01.21 21:24, Jason Gunthorpe wrote:
> On Mon, Jan 18, 2021 at 03:08:51PM -0500, Douglas Gilbert wrote:
>> On 2021-01-18 1:28 p.m., Jason Gunthorpe wrote:
>>> On Mon, Jan 18, 2021 at 11:30:03AM -0500, Douglas Gilbert wrote:
>>>
>>>> After several flawed attempts to detect overflow, take the fastest
>>>> route by stating as a pre-condition that the 'order' function argument
>>>> cannot exceed 16 (2^16 * 4k = 256 MiB).
>>>
>>> That doesn't help, the point of the overflow check is similar to
>>> overflow checks in kcalloc: to prevent the routine from allocating
>>> less memory than the caller might assume.
>>>
>>> For instance ipr_store_update_fw() uses request_firmware() (which is
>>> controlled by userspace) to drive the length argument to
>>> sgl_alloc_order(). If userpace gives too large a value this will
>>> corrupt kernel memory.
>>>
>>> So this math:
>>>
>>>     	nent = round_up(length, PAGE_SIZE << order) >> (PAGE_SHIFT + order);
>>
>> But that check itself overflows if order is too large (e.g. 65).
> 
> I don't reall care about order. It is always controlled by the kernel
> and it is fine to just require it be low enough to not
> overflow. length is the data under userspace control so math on it
> must be checked for overflow.
> 
>> Also note there is another pre-condition statement in that function's
>> definition, namely that length cannot be 0.
> 
> I don't see callers checking for that either, if it is true length 0
> can't be allowed it should be blocked in the function
> 
> Jason
> 

A already said, I also think there should be a check for length or
rather nent overflow.

I like the easy to understand check in your proposed code:

	if (length >> (PAGE_SHIFT + order) >= UINT_MAX)
		return NULL;


But I don't understand, why you open-coded the nent calculation:

	nent = length >> (PAGE_SHIFT + order);
	if (length & ((1ULL << (PAGE_SHIFT + order)) - 1))
		nent++;

Wouldn't it be better to keep the original line instead:

	nent = round_up(length, PAGE_SIZE << order) >> (PAGE_SHIFT + order);

Or maybe even better:

	nent = DIV_ROUND_UP(length, PAGE_SIZE << order);


I think, combining the above lines results in short and easily readable code:


	u32 elem_len;

	if (length >> (PAGE_SHIFT + order) >= UINT_MAX)
		return NULL;
	nent = DIV_ROUND_UP(length, PAGE_SIZE << order);

	if (chainable) {
		if (check_add_overflow(nent, 1, &nalloc))
			return NULL;
	}
	else
		nalloc = nent;


Thank you,
Bodo


	

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 3/4] scatterlist: add sgl_compare_sgl() function
  2021-01-18 16:30 ` [PATCH v6 3/4] scatterlist: add sgl_compare_sgl() function Douglas Gilbert
@ 2021-01-18 23:27   ` David Disseldorp
  2021-01-19  1:04     ` Douglas Gilbert
  0 siblings, 1 reply; 21+ messages in thread
From: David Disseldorp @ 2021-01-18 23:27 UTC (permalink / raw)
  To: Douglas Gilbert
  Cc: linux-scsi, linux-block, target-devel, linux-rdma, linux-kernel,
	martin.petersen, jejb, bostroesser, bvanassche, jgg

On Mon, 18 Jan 2021 11:30:05 -0500, Douglas Gilbert wrote:

> After enabling copies between scatter gather lists (sgl_s), another
> storage related operation is to compare two sgl_s. This new function
> is modelled on NVMe's Compare command and the SCSI VERIFY(BYTCHK=1)
> command. Like memcmp() this function returns false on the first
> miscompare and stops comparing.
> 
> A helper function called sgl_compare_sgl_idx() is added. It takes an
> additional parameter (miscompare_idx) which is a pointer. If that
> pointer is non-NULL and a miscompare is detected (i.e. the function
> returns false) then the byte index of the first miscompare is written
> to *miscomapre_idx. Knowing the location of the first miscompare is
> needed to implement the SCSI COMPARE AND WRITE command properly.
> 
> Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
> Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
> ---
>  include/linux/scatterlist.h |   8 +++
>  lib/scatterlist.c           | 109 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 117 insertions(+)
> 
> diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
> index 3f836a3246aa..71be65f9ebb5 100644
> --- a/include/linux/scatterlist.h
> +++ b/include/linux/scatterlist.h
> @@ -325,6 +325,14 @@ size_t sgl_copy_sgl(struct scatterlist *d_sgl, unsigned int d_nents, off_t d_ski
>  		    struct scatterlist *s_sgl, unsigned int s_nents, off_t s_skip,
>  		    size_t n_bytes);
>  
> +bool sgl_compare_sgl(struct scatterlist *x_sgl, unsigned int x_nents, off_t x_skip,
> +		     struct scatterlist *y_sgl, unsigned int y_nents, off_t y_skip,
> +		     size_t n_bytes);
> +
> +bool sgl_compare_sgl_idx(struct scatterlist *x_sgl, unsigned int x_nents, off_t x_skip,
> +			 struct scatterlist *y_sgl, unsigned int y_nents, off_t y_skip,
> +			 size_t n_bytes, size_t *miscompare_idx);


This patch looks good and works fine as a replacement for
compare_and_write_do_cmp(). One minor suggestion would be to name it
sgl_equal() or similar, to perhaps better reflect the bool return and
avoid memcmp() confusion. Either way:
Reviewed-by: David Disseldorp <ddiss@suse.de>

Cheers, David

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 1/4] sgl_alloc_order: remove 4 GiB limit, sgl_free() warning
  2021-01-18 21:22         ` Bodo Stroesser
@ 2021-01-18 23:48           ` Jason Gunthorpe
  2021-01-19  1:27             ` Douglas Gilbert
  2021-01-19 17:24             ` Bodo Stroesser
  0 siblings, 2 replies; 21+ messages in thread
From: Jason Gunthorpe @ 2021-01-18 23:48 UTC (permalink / raw)
  To: Bodo Stroesser
  Cc: Douglas Gilbert, linux-scsi, linux-block, target-devel,
	linux-rdma, linux-kernel, martin.petersen, jejb, ddiss,
	bvanassche

On Mon, Jan 18, 2021 at 10:22:56PM +0100, Bodo Stroesser wrote:
> On 18.01.21 21:24, Jason Gunthorpe wrote:
> > On Mon, Jan 18, 2021 at 03:08:51PM -0500, Douglas Gilbert wrote:
> >> On 2021-01-18 1:28 p.m., Jason Gunthorpe wrote:
> >>> On Mon, Jan 18, 2021 at 11:30:03AM -0500, Douglas Gilbert wrote:
> >>>
> >>>> After several flawed attempts to detect overflow, take the fastest
> >>>> route by stating as a pre-condition that the 'order' function argument
> >>>> cannot exceed 16 (2^16 * 4k = 256 MiB).
> >>>
> >>> That doesn't help, the point of the overflow check is similar to
> >>> overflow checks in kcalloc: to prevent the routine from allocating
> >>> less memory than the caller might assume.
> >>>
> >>> For instance ipr_store_update_fw() uses request_firmware() (which is
> >>> controlled by userspace) to drive the length argument to
> >>> sgl_alloc_order(). If userpace gives too large a value this will
> >>> corrupt kernel memory.
> >>>
> >>> So this math:
> >>>
> >>>     	nent = round_up(length, PAGE_SIZE << order) >> (PAGE_SHIFT + order);
> >>
> >> But that check itself overflows if order is too large (e.g. 65).
> > 
> > I don't reall care about order. It is always controlled by the kernel
> > and it is fine to just require it be low enough to not
> > overflow. length is the data under userspace control so math on it
> > must be checked for overflow.
> > 
> >> Also note there is another pre-condition statement in that function's
> >> definition, namely that length cannot be 0.
> > 
> > I don't see callers checking for that either, if it is true length 0
> > can't be allowed it should be blocked in the function
> > 
> > Jason
> > 
> 
> A already said, I also think there should be a check for length or
> rather nent overflow.
> 
> I like the easy to understand check in your proposed code:
> 
> 	if (length >> (PAGE_SHIFT + order) >= UINT_MAX)
> 		return NULL;
> 
> 
> But I don't understand, why you open-coded the nent calculation:
> 
> 	nent = length >> (PAGE_SHIFT + order);
> 	if (length & ((1ULL << (PAGE_SHIFT + order)) - 1))
> 		nent++;

It is necessary to properly check for overflow, because the easy to
understand check doesn't prove that round_up will work, only that >>
results in something that fits in an int and that +1 won't overflow
the int.

> Wouldn't it be better to keep the original line instead:
> 
> 	nent = round_up(length, PAGE_SIZE << order) >> (PAGE_SHIFT + order);

This can overflow inside the round_up

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 3/4] scatterlist: add sgl_compare_sgl() function
  2021-01-18 23:27   ` David Disseldorp
@ 2021-01-19  1:04     ` Douglas Gilbert
  2021-01-19 11:50       ` David Disseldorp
  0 siblings, 1 reply; 21+ messages in thread
From: Douglas Gilbert @ 2021-01-19  1:04 UTC (permalink / raw)
  To: David Disseldorp
  Cc: linux-scsi, linux-block, target-devel, linux-rdma, linux-kernel,
	martin.petersen, jejb, bostroesser, bvanassche, jgg

On 2021-01-18 6:27 p.m., David Disseldorp wrote:
> On Mon, 18 Jan 2021 11:30:05 -0500, Douglas Gilbert wrote:
> 
>> After enabling copies between scatter gather lists (sgl_s), another
>> storage related operation is to compare two sgl_s. This new function
>> is modelled on NVMe's Compare command and the SCSI VERIFY(BYTCHK=1)
>> command. Like memcmp() this function returns false on the first
>> miscompare and stops comparing.
>>
>> A helper function called sgl_compare_sgl_idx() is added. It takes an
>> additional parameter (miscompare_idx) which is a pointer. If that
>> pointer is non-NULL and a miscompare is detected (i.e. the function
>> returns false) then the byte index of the first miscompare is written
>> to *miscomapre_idx. Knowing the location of the first miscompare is
>> needed to implement the SCSI COMPARE AND WRITE command properly.
>>
>> Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
>> Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
>> ---
>>   include/linux/scatterlist.h |   8 +++
>>   lib/scatterlist.c           | 109 ++++++++++++++++++++++++++++++++++++
>>   2 files changed, 117 insertions(+)
>>
>> diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
>> index 3f836a3246aa..71be65f9ebb5 100644
>> --- a/include/linux/scatterlist.h
>> +++ b/include/linux/scatterlist.h
>> @@ -325,6 +325,14 @@ size_t sgl_copy_sgl(struct scatterlist *d_sgl, unsigned int d_nents, off_t d_ski
>>   		    struct scatterlist *s_sgl, unsigned int s_nents, off_t s_skip,
>>   		    size_t n_bytes);
>>   
>> +bool sgl_compare_sgl(struct scatterlist *x_sgl, unsigned int x_nents, off_t x_skip,
>> +		     struct scatterlist *y_sgl, unsigned int y_nents, off_t y_skip,
>> +		     size_t n_bytes);
>> +
>> +bool sgl_compare_sgl_idx(struct scatterlist *x_sgl, unsigned int x_nents, off_t x_skip,
>> +			 struct scatterlist *y_sgl, unsigned int y_nents, off_t y_skip,
>> +			 size_t n_bytes, size_t *miscompare_idx);
> 
> 
> This patch looks good and works fine as a replacement for
> compare_and_write_do_cmp(). One minor suggestion would be to name it
> sgl_equal() or similar, to perhaps better reflect the bool return and
> avoid memcmp() confusion. Either way:
> Reviewed-by: David Disseldorp <ddiss@suse.de>

Thanks. NVMe calls the command that does this Compare and SCSI uses
COMPARE AND WRITE (and VERIFY(BYTCHK=1) ) but "equal" is fine with me.
There will be another patchset version (at least) so there is time
to change.

Do you want:
   - sgl_equal(...), or
   - sgl_equal_sgl(...) ?

Doug Gilbert


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 1/4] sgl_alloc_order: remove 4 GiB limit, sgl_free() warning
  2021-01-18 23:48           ` Jason Gunthorpe
@ 2021-01-19  1:27             ` Douglas Gilbert
  2021-01-19 12:59               ` Jason Gunthorpe
  2021-01-19 17:24             ` Bodo Stroesser
  1 sibling, 1 reply; 21+ messages in thread
From: Douglas Gilbert @ 2021-01-19  1:27 UTC (permalink / raw)
  To: Jason Gunthorpe, Bodo Stroesser
  Cc: linux-scsi, linux-block, target-devel, linux-rdma, linux-kernel,
	martin.petersen, jejb, ddiss, bvanassche

On 2021-01-18 6:48 p.m., Jason Gunthorpe wrote:
> On Mon, Jan 18, 2021 at 10:22:56PM +0100, Bodo Stroesser wrote:
>> On 18.01.21 21:24, Jason Gunthorpe wrote:
>>> On Mon, Jan 18, 2021 at 03:08:51PM -0500, Douglas Gilbert wrote:
>>>> On 2021-01-18 1:28 p.m., Jason Gunthorpe wrote:
>>>>> On Mon, Jan 18, 2021 at 11:30:03AM -0500, Douglas Gilbert wrote:
>>>>>
>>>>>> After several flawed attempts to detect overflow, take the fastest
>>>>>> route by stating as a pre-condition that the 'order' function argument
>>>>>> cannot exceed 16 (2^16 * 4k = 256 MiB).
>>>>>
>>>>> That doesn't help, the point of the overflow check is similar to
>>>>> overflow checks in kcalloc: to prevent the routine from allocating
>>>>> less memory than the caller might assume.
>>>>>
>>>>> For instance ipr_store_update_fw() uses request_firmware() (which is
>>>>> controlled by userspace) to drive the length argument to
>>>>> sgl_alloc_order(). If userpace gives too large a value this will
>>>>> corrupt kernel memory.
>>>>>
>>>>> So this math:
>>>>>
>>>>>      	nent = round_up(length, PAGE_SIZE << order) >> (PAGE_SHIFT + order);
>>>>
>>>> But that check itself overflows if order is too large (e.g. 65).
>>>
>>> I don't reall care about order. It is always controlled by the kernel
>>> and it is fine to just require it be low enough to not
>>> overflow. length is the data under userspace control so math on it
>>> must be checked for overflow.
>>>
>>>> Also note there is another pre-condition statement in that function's
>>>> definition, namely that length cannot be 0.
>>>
>>> I don't see callers checking for that either, if it is true length 0
>>> can't be allowed it should be blocked in the function
>>>
>>> Jason
>>>
>>
>> A already said, I also think there should be a check for length or
>> rather nent overflow.
>>
>> I like the easy to understand check in your proposed code:
>>
>> 	if (length >> (PAGE_SHIFT + order) >= UINT_MAX)
>> 		return NULL;
>>
>>
>> But I don't understand, why you open-coded the nent calculation:
>>
>> 	nent = length >> (PAGE_SHIFT + order);
>> 	if (length & ((1ULL << (PAGE_SHIFT + order)) - 1))
>> 		nent++;
> 
> It is necessary to properly check for overflow, because the easy to
> understand check doesn't prove that round_up will work, only that >>
> results in something that fits in an int and that +1 won't overflow
> the int.
> 
>> Wouldn't it be better to keep the original line instead:
>>
>> 	nent = round_up(length, PAGE_SIZE << order) >> (PAGE_SHIFT + order);
> 
> This can overflow inside the round_up

To protect against the "unsigned long long" length being too big why
not pick a large power of two and if someone can justify a larger
value, they can send a patch.

         if (length > 64ULL * 1024 * 1024 * 1024)
		return NULL;

So 64 GiB or a similar calculation involving PAGE_SIZE. Compiler does
the multiplication and at run time there is only a 64 bit comparison.


I tested 6 one GiB ramdisks on an 8 GiB machine, worked fine until
firefox was started. Then came the OOM killer ...

Doug Gilbert


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 3/4] scatterlist: add sgl_compare_sgl() function
  2021-01-19  1:04     ` Douglas Gilbert
@ 2021-01-19 11:50       ` David Disseldorp
  0 siblings, 0 replies; 21+ messages in thread
From: David Disseldorp @ 2021-01-19 11:50 UTC (permalink / raw)
  To: Douglas Gilbert
  Cc: linux-scsi, linux-block, target-devel, linux-rdma, linux-kernel,
	martin.petersen, jejb, bostroesser, bvanassche, jgg

On Mon, 18 Jan 2021 20:04:20 -0500, Douglas Gilbert wrote:

> >> +bool sgl_compare_sgl(struct scatterlist *x_sgl, unsigned int x_nents, off_t x_skip,
> >> +		     struct scatterlist *y_sgl, unsigned int y_nents, off_t y_skip,
> >> +		     size_t n_bytes);
> >> +
> >> +bool sgl_compare_sgl_idx(struct scatterlist *x_sgl, unsigned int x_nents, off_t x_skip,
> >> +			 struct scatterlist *y_sgl, unsigned int y_nents, off_t y_skip,
> >> +			 size_t n_bytes, size_t *miscompare_idx);  
> > 
> > 
> > This patch looks good and works fine as a replacement for
> > compare_and_write_do_cmp(). One minor suggestion would be to name it
> > sgl_equal() or similar, to perhaps better reflect the bool return and
> > avoid memcmp() confusion. Either way:
> > Reviewed-by: David Disseldorp <ddiss@suse.de>  
> 
> Thanks. NVMe calls the command that does this Compare and SCSI uses
> COMPARE AND WRITE (and VERIFY(BYTCHK=1) ) but "equal" is fine with me.
> There will be another patchset version (at least) so there is time
> to change.
> 
> Do you want:
>    - sgl_equal(...), or
>    - sgl_equal_sgl(...) ?

I'd probably prefer the former as it's shorter, but I don't feel
strongly about it. The latter would make sense if you expect sgl compare
helpers for other buffer types.

Cheers, David

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 1/4] sgl_alloc_order: remove 4 GiB limit, sgl_free() warning
  2021-01-19  1:27             ` Douglas Gilbert
@ 2021-01-19 12:59               ` Jason Gunthorpe
  0 siblings, 0 replies; 21+ messages in thread
From: Jason Gunthorpe @ 2021-01-19 12:59 UTC (permalink / raw)
  To: Douglas Gilbert
  Cc: Bodo Stroesser, linux-scsi, linux-block, target-devel,
	linux-rdma, linux-kernel, martin.petersen, jejb, ddiss,
	bvanassche

On Mon, Jan 18, 2021 at 08:27:09PM -0500, Douglas Gilbert wrote:

> To protect against the "unsigned long long" length being too big why
> not pick a large power of two and if someone can justify a larger
> value, they can send a patch.
> 
>         if (length > 64ULL * 1024 * 1024 * 1024)
> 		return NULL;

That is not how we protect against arithemetic overflows in the kernel

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 1/4] sgl_alloc_order: remove 4 GiB limit, sgl_free() warning
  2021-01-18 23:48           ` Jason Gunthorpe
  2021-01-19  1:27             ` Douglas Gilbert
@ 2021-01-19 17:24             ` Bodo Stroesser
  2021-01-19 18:03               ` Jason Gunthorpe
  1 sibling, 1 reply; 21+ messages in thread
From: Bodo Stroesser @ 2021-01-19 17:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Douglas Gilbert, linux-scsi, linux-block, target-devel,
	linux-rdma, linux-kernel, martin.petersen, jejb, ddiss,
	bvanassche

On 19.01.21 00:48, Jason Gunthorpe wrote:
> On Mon, Jan 18, 2021 at 10:22:56PM +0100, Bodo Stroesser wrote:
>> On 18.01.21 21:24, Jason Gunthorpe wrote:
>>> On Mon, Jan 18, 2021 at 03:08:51PM -0500, Douglas Gilbert wrote:
>>>> On 2021-01-18 1:28 p.m., Jason Gunthorpe wrote:
>>>>> On Mon, Jan 18, 2021 at 11:30:03AM -0500, Douglas Gilbert wrote:
>>>>>
>>>>>> After several flawed attempts to detect overflow, take the fastest
>>>>>> route by stating as a pre-condition that the 'order' function argument
>>>>>> cannot exceed 16 (2^16 * 4k = 256 MiB).
>>>>>
>>>>> That doesn't help, the point of the overflow check is similar to
>>>>> overflow checks in kcalloc: to prevent the routine from allocating
>>>>> less memory than the caller might assume.
>>>>>
>>>>> For instance ipr_store_update_fw() uses request_firmware() (which is
>>>>> controlled by userspace) to drive the length argument to
>>>>> sgl_alloc_order(). If userpace gives too large a value this will
>>>>> corrupt kernel memory.
>>>>>
>>>>> So this math:
>>>>>
>>>>>      	nent = round_up(length, PAGE_SIZE << order) >> (PAGE_SHIFT + order);
>>>>
>>>> But that check itself overflows if order is too large (e.g. 65).
>>>
>>> I don't reall care about order. It is always controlled by the kernel
>>> and it is fine to just require it be low enough to not
>>> overflow. length is the data under userspace control so math on it
>>> must be checked for overflow.
>>>
>>>> Also note there is another pre-condition statement in that function's
>>>> definition, namely that length cannot be 0.
>>>
>>> I don't see callers checking for that either, if it is true length 0
>>> can't be allowed it should be blocked in the function
>>>
>>> Jason
>>>
>>
>> A already said, I also think there should be a check for length or
>> rather nent overflow.
>>
>> I like the easy to understand check in your proposed code:
>>
>> 	if (length >> (PAGE_SHIFT + order) >= UINT_MAX)
>> 		return NULL;
>>
>>
>> But I don't understand, why you open-coded the nent calculation:
>>
>> 	nent = length >> (PAGE_SHIFT + order);
>> 	if (length & ((1ULL << (PAGE_SHIFT + order)) - 1))
>> 		nent++;
> 
> It is necessary to properly check for overflow, because the easy to
> understand check doesn't prove that round_up will work, only that >>
> results in something that fits in an int and that +1 won't overflow
> the int.
> 
>> Wouldn't it be better to keep the original line instead:
>>
>> 	nent = round_up(length, PAGE_SIZE << order) >> (PAGE_SHIFT + order);
> 
> This can overflow inside the round_up

I had a second look into math.h, but I don't find any reason why 
round_up could overflow. Can you give a hint please?

Regarding the overflow checks: would it be a good idea to not check
length >> (PAGE_SHIFT + order) in the beginning, but check nalloc
immediately before the kmalloc_array() as the only overrun check:

	if ((unsigned long long)nalloc << (PAGE_SHIFT + order) < length)
		return NULL;

-Bodo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 1/4] sgl_alloc_order: remove 4 GiB limit, sgl_free() warning
  2021-01-19 17:24             ` Bodo Stroesser
@ 2021-01-19 18:03               ` Jason Gunthorpe
  2021-01-19 18:08                 ` Bodo Stroesser
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2021-01-19 18:03 UTC (permalink / raw)
  To: Bodo Stroesser
  Cc: Douglas Gilbert, linux-scsi, linux-block, target-devel,
	linux-rdma, linux-kernel, martin.petersen, jejb, ddiss,
	bvanassche

On Tue, Jan 19, 2021 at 06:24:49PM +0100, Bodo Stroesser wrote:
> 
> I had a second look into math.h, but I don't find any reason why round_up
> could overflow. Can you give a hint please?

#define round_up(x, y) ((((x)-1) | __round_mask(x, y))+1)
                                                    ^^^^^

That +1 can overflow

It looks like it would not be so bad to implement some
check_round_up_overflow() if people prefer

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 1/4] sgl_alloc_order: remove 4 GiB limit, sgl_free() warning
  2021-01-19 18:03               ` Jason Gunthorpe
@ 2021-01-19 18:08                 ` Bodo Stroesser
  2021-01-19 18:17                   ` Jason Gunthorpe
  0 siblings, 1 reply; 21+ messages in thread
From: Bodo Stroesser @ 2021-01-19 18:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Douglas Gilbert, linux-scsi, linux-block, target-devel,
	linux-rdma, linux-kernel, martin.petersen, jejb, ddiss,
	bvanassche

On 19.01.21 19:03, Jason Gunthorpe wrote:
> On Tue, Jan 19, 2021 at 06:24:49PM +0100, Bodo Stroesser wrote:
>>
>> I had a second look into math.h, but I don't find any reason why round_up
>> could overflow. Can you give a hint please?
> 
> #define round_up(x, y) ((((x)-1) | __round_mask(x, y))+1)
>                                                      ^^^^^
> 
> That +1 can overflow

But that would be a unsigned long long overflow. I considered this to
not be relevant.

> 
> It looks like it would not be so bad to implement some
> check_round_up_overflow() if people prefer
> 
> Jason
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 1/4] sgl_alloc_order: remove 4 GiB limit, sgl_free() warning
  2021-01-19 18:08                 ` Bodo Stroesser
@ 2021-01-19 18:17                   ` Jason Gunthorpe
  2021-01-19 18:39                     ` Bodo Stroesser
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2021-01-19 18:17 UTC (permalink / raw)
  To: Bodo Stroesser
  Cc: Douglas Gilbert, linux-scsi, linux-block, target-devel,
	linux-rdma, linux-kernel, martin.petersen, jejb, ddiss,
	bvanassche

On Tue, Jan 19, 2021 at 07:08:32PM +0100, Bodo Stroesser wrote:
> On 19.01.21 19:03, Jason Gunthorpe wrote:
> > On Tue, Jan 19, 2021 at 06:24:49PM +0100, Bodo Stroesser wrote:
> > > 
> > > I had a second look into math.h, but I don't find any reason why round_up
> > > could overflow. Can you give a hint please?
> > 
> > #define round_up(x, y) ((((x)-1) | __round_mask(x, y))+1)
> >                                                      ^^^^^
> > 
> > That +1 can overflow
> 
> But that would be a unsigned long long overflow. I considered this to
> not be relevant.

Why not? It still makes nents 0 and still causes a bad bug

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v6 1/4] sgl_alloc_order: remove 4 GiB limit, sgl_free() warning
  2021-01-19 18:17                   ` Jason Gunthorpe
@ 2021-01-19 18:39                     ` Bodo Stroesser
  0 siblings, 0 replies; 21+ messages in thread
From: Bodo Stroesser @ 2021-01-19 18:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Douglas Gilbert, linux-scsi, linux-block, target-devel,
	linux-rdma, linux-kernel, martin.petersen, jejb, ddiss,
	bvanassche

On 19.01.21 19:17, Jason Gunthorpe wrote:
> On Tue, Jan 19, 2021 at 07:08:32PM +0100, Bodo Stroesser wrote:
>> On 19.01.21 19:03, Jason Gunthorpe wrote:
>>> On Tue, Jan 19, 2021 at 06:24:49PM +0100, Bodo Stroesser wrote:
>>>>
>>>> I had a second look into math.h, but I don't find any reason why round_up
>>>> could overflow. Can you give a hint please?
>>>
>>> #define round_up(x, y) ((((x)-1) | __round_mask(x, y))+1)
>>>                                                       ^^^^^
>>>
>>> That +1 can overflow
>>
>> But that would be a unsigned long long overflow. I considered this to
>> not be relevant.
> 
> Why not? It still makes nents 0 and still causes a bad bug
> 

Generally spoken, you of course are right.

OTOH, if someone tries to allocate such big sgls, then we will run into
trouble during memory allocation even without overrun.

Anyway, if we first calculate nent and nalloc and then check with

	if ((unsigned long long)nalloc << (PAGE_SHIFT + order) < length)
		return NULL;

I think we would have checked against all kind of overrun in a single
step. Or am I missing something?

Bodo



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-01-19 21:26 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-18 16:30 [PATCH v6 0/4] scatterlist: add new capabilities Douglas Gilbert
2021-01-18 16:30 ` [PATCH v6 1/4] sgl_alloc_order: remove 4 GiB limit, sgl_free() warning Douglas Gilbert
2021-01-18 18:28   ` Jason Gunthorpe
2021-01-18 20:08     ` Douglas Gilbert
2021-01-18 20:24       ` Jason Gunthorpe
2021-01-18 21:22         ` Bodo Stroesser
2021-01-18 23:48           ` Jason Gunthorpe
2021-01-19  1:27             ` Douglas Gilbert
2021-01-19 12:59               ` Jason Gunthorpe
2021-01-19 17:24             ` Bodo Stroesser
2021-01-19 18:03               ` Jason Gunthorpe
2021-01-19 18:08                 ` Bodo Stroesser
2021-01-19 18:17                   ` Jason Gunthorpe
2021-01-19 18:39                     ` Bodo Stroesser
2021-01-18 20:46       ` Bodo Stroesser
2021-01-18 16:30 ` [PATCH v6 2/4] scatterlist: add sgl_copy_sgl() function Douglas Gilbert
2021-01-18 16:30 ` [PATCH v6 3/4] scatterlist: add sgl_compare_sgl() function Douglas Gilbert
2021-01-18 23:27   ` David Disseldorp
2021-01-19  1:04     ` Douglas Gilbert
2021-01-19 11:50       ` David Disseldorp
2021-01-18 16:30 ` [PATCH v6 4/4] scatterlist: add sgl_memset() Douglas Gilbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).