All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] scatterlist: add operations for scsi_debug
@ 2022-11-12 19:49 Douglas Gilbert
  2022-11-12 19:49 ` [PATCH v2 1/5] sgl_alloc_order: remove 4 GiB limit Douglas Gilbert
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Douglas Gilbert @ 2022-11-12 19:49 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare, bvanassche, bostroesser, jgg

The scsi_debug driver is essentially a ramdisk dressed up as a SCSI host
with one or more SCSI devices attached. Like all low level SCSI drivers,
the scsi_debug driver receives and provides data to the SCSI mid-level
(and the block layer) using scatterlists whose interface is found in
include/linux/scatterlist.h .

After trying kmalloc() then vmalloc() based storage for the scsi_debug
driver, it was found that certain SCSI commands can be optimized if
instead one or more scatterlists is used as its backing store. The
specific SCSI command that benefits the most is VERIFY(BYTCHK=1) whose
NVMe equivalent is COMPARE. These commands have data-out buffers
provided by an application that are compared by the storage device
with the LBA and count (of blocks) given in the command. In this
case the sgl_equal_sgl() function can be used instead of setting up
a temporary buffer.

The implementation of the more common SCSI READ and WRITE commands are
simplified by using the sgl_copy_sgl() function.

The first patch in this series removes an undocumented 4 GB limit in
the existing sgl_alloc_order() function.

In the final patch of this series, the scsi_debug driver uses the
new facilities in scatterlist to replace its vmalloc() backing store
with a sgl_alloc_order() based store. Also several loops based on
memcpy() and memcmp() are replaced by the new scatterlist copy
and 'equal' functions.

Changes since v1 (sent to linux-scsi list on 20221023)
  - in sgl_alloc_order() add check that order argument is less
    then MAX_ORDER; protects following call to round_up()
  - in sdeb_sgl_cmp_buf() within scsi_debug.c remove call to
    sg_miter_stop() as suggested by reviewer


Douglas Gilbert (5):
  sgl_alloc_order: remove 4 GiB limit
  scatterlist: add sgl_copy_sgl() function
  scatterlist: add sgl_equal_sgl() function
  scatterlist: add sgl_memset()
  scsi_debug: change store from vmalloc to sgl

 drivers/scsi/Kconfig        |   3 +-
 drivers/scsi/scsi_debug.c   | 442 ++++++++++++++++++++++++------------
 include/linux/scatterlist.h |  33 ++-
 lib/scatterlist.c           | 255 ++++++++++++++++++---
 4 files changed, 562 insertions(+), 171 deletions(-)

-- 
2.37.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/5] sgl_alloc_order: remove 4 GiB limit
  2022-11-12 19:49 [PATCH v2 0/5] scatterlist: add operations for scsi_debug Douglas Gilbert
@ 2022-11-12 19:49 ` Douglas Gilbert
  2022-11-15 20:33   ` Jason Gunthorpe
  2022-11-12 19:49 ` [PATCH v2 2/5] scatterlist: add sgl_copy_sgl() function Douglas Gilbert
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Douglas Gilbert @ 2022-11-12 19:49 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare, bvanassche, bostroesser, jgg

This patch fixes a check done by sgl_alloc_order() before it starts
any allocations. The comment in the original said: "Check for integer
overflow" but the right hand side of the expression in the condition
is resolved as u32 so it can not exceed UINT32_MAX (4 GiB) which
means 'length' can not exceed that value.

This function may be used to replace vmalloc(unsigned long) for a
large allocation (e.g. a ramdisk). vmalloc has no limit at 4 GiB so
it seems unreasonable that sgl_alloc_order() whose length type is
unsigned long long should be limited to 4 GB.

Solutions to this issue were discussed by Jason Gunthorpe
<jgg@ziepe.ca> and Bodo Stroesser <bostroesser@gmail.com>. This
version is base on a linux-scsi post by Jason titled: "Re:
[PATCH v7 1/4] sgl_alloc_order: remove 4 GiB limit" dated 20220201.

An earlier patch fixed a memory leak in sg_alloc_order() due to the
misuse of sgl_free(). Take the opportunity to put a one line comment
above sgl_free()'s declaration warning that it is not suitable when
order > 0 .

Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 include/linux/scatterlist.h |  1 +
 lib/scatterlist.c           | 23 ++++++++++++++---------
 2 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 375a5e90d86a..0930755a756e 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -426,6 +426,7 @@ struct scatterlist *sgl_alloc(unsigned long long length, gfp_t gfp,
 			      unsigned int *nent_p);
 void sgl_free_n_order(struct scatterlist *sgl, int nents, int order);
 void sgl_free_order(struct scatterlist *sgl, int order);
+/* Only use sgl_free() when order is 0 */
 void sgl_free(struct scatterlist *sgl);
 #endif /* CONFIG_SGL_ALLOC */
 
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index c8c3d675845c..ee69d33d1228 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -585,13 +585,16 @@ EXPORT_SYMBOL(sg_alloc_table_from_pages_segment);
 #ifdef CONFIG_SGL_ALLOC
 
 /**
- * sgl_alloc_order - allocate a scatterlist and its pages
+ * sgl_alloc_order - allocate a scatterlist with equally sized elements each
+ *		     of which has 2^@order continuous pages
  * @length: Length in bytes of the scatterlist. Must be at least one
- * @order: Second argument for alloc_pages()
+ * @order:  Second argument for alloc_pages(). Each sgl element size will
+ *	    be (PAGE_SIZE*2^@order) bytes. @order must not exceed 16.
  * @chainable: Whether or not to allocate an extra element in the scatterlist
- *	for scatterlist chaining purposes
+ *	       for scatterlist chaining purposes
  * @gfp: Memory allocation flags
- * @nent_p: [out] Number of entries in the scatterlist that have pages
+ * @nent_p: [out] Number of entries in the scatterlist that have pages.
+ *		  Ignored if @nent_p is NULL.
  *
  * Returns: A pointer to an initialized scatterlist or %NULL upon failure.
  */
@@ -601,14 +604,16 @@ struct scatterlist *sgl_alloc_order(unsigned long long length,
 {
 	struct scatterlist *sgl, *sg;
 	struct page *page;
-	unsigned int nent, nalloc;
+	uint64_t nent;
+	unsigned int nalloc;
 	u32 elem_len;
 
+	if (WARN_ON_ONCE(order >= MAX_ORDER))
+		return NULL;
 	nent = round_up(length, PAGE_SIZE << order) >> (PAGE_SHIFT + order);
-	/* Check for integer overflow */
-	if (length > (nent << (PAGE_SHIFT + order)))
+	if (nent > UINT_MAX)
 		return NULL;
-	nalloc = nent;
+	nalloc = (unsigned int)nent;
 	if (chainable) {
 		/* Check for integer overflow */
 		if (nalloc + 1 < nalloc)
@@ -636,7 +641,7 @@ struct scatterlist *sgl_alloc_order(unsigned long long length,
 	}
 	WARN_ONCE(length, "length = %lld\n", length);
 	if (nent_p)
-		*nent_p = nent;
+		*nent_p = (unsigned int)nent;
 	return sgl;
 }
 EXPORT_SYMBOL(sgl_alloc_order);
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/5] scatterlist: add sgl_copy_sgl() function
  2022-11-12 19:49 [PATCH v2 0/5] scatterlist: add operations for scsi_debug Douglas Gilbert
  2022-11-12 19:49 ` [PATCH v2 1/5] sgl_alloc_order: remove 4 GiB limit Douglas Gilbert
@ 2022-11-12 19:49 ` Douglas Gilbert
  2022-11-16  5:59   ` Christoph Hellwig
  2022-11-12 19:49 ` [PATCH v2 3/5] scatterlist: add sgl_equal_sgl() function Douglas Gilbert
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Douglas Gilbert @ 2022-11-12 19:49 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare, bvanassche, bostroesser, jgg

Both the SCSI and NVMe subsystems receive user data from the block
layer in scatterlist_s (aka scatter gather lists (sgl) which are
often arrays). If drivers in those subsystems represent storage
(e.g. a ramdisk) or cache "hot" user data then they may also
choose to use scatterlist_s. Currently there are no sgl to sgl
operations in the kernel. Start with a sgl to sgl copy. Stops
when the first of the number of requested bytes to copy, or the
source sgl, or the destination sgl is exhausted. So the
destination sgl will _not_ grow.

Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 include/linux/scatterlist.h |  4 ++
 lib/scatterlist.c           | 74 +++++++++++++++++++++++++++++++++++++
 2 files changed, 78 insertions(+)

diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 0930755a756e..cea1edd246cb 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -445,6 +445,10 @@ size_t sg_pcopy_to_buffer(struct scatterlist *sgl, unsigned int nents,
 size_t sg_zero_buffer(struct scatterlist *sgl, unsigned int nents,
 		       size_t buflen, off_t skip);
 
+size_t sgl_copy_sgl(struct scatterlist *d_sgl, unsigned int d_nents, off_t d_skip,
+		    struct scatterlist *s_sgl, unsigned int s_nents, off_t s_skip,
+		    size_t n_bytes);
+
 /*
  * Maximum number of entries that will be allocated in one piece, if
  * a list larger than this is required then chaining will be utilized.
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index ee69d33d1228..67a3cd04262b 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -1090,3 +1090,77 @@ size_t sg_zero_buffer(struct scatterlist *sgl, unsigned int nents,
 	return offset;
 }
 EXPORT_SYMBOL(sg_zero_buffer);
+
+/**
+ * sgl_copy_sgl - Copy over a destination sgl from a source sgl
+ * @d_sgl:		 Destination sgl
+ * @d_nents:		 Number of SG entries in destination sgl
+ * @d_skip:		 Number of bytes to skip in destination before starting
+ * @s_sgl:		 Source sgl
+ * @s_nents:		 Number of SG entries in source sgl
+ * @s_skip:		 Number of bytes to skip in source before starting
+ * @n_bytes:		 The (maximum) number of bytes to copy
+ *
+ * Returns:
+ *   The number of copied bytes.
+ *
+ * Notes:
+ *   Destination arguments appear before the source arguments, as with memcpy().
+ *
+ *   Stops copying if either d_sgl, s_sgl or n_bytes is exhausted.
+ *
+ *   Since memcpy() is used, overlapping copies (where d_sgl and s_sgl belong
+ *   to the same sgl and the copy regions overlap) are not supported.
+ *
+ *   Large copies are broken into copy segments whose sizes may vary. Those
+ *   copy segment sizes are chosen by the min3() statement in the code below.
+ *   Since SG_MITER_ATOMIC is used for both sides, each copy segment is started
+ *   with kmap_atomic() [in sg_miter_next()] and completed with kunmap_atomic()
+ *   [in sg_miter_stop()]. This means pre-emption is inhibited for relatively
+ *   short periods even in very large copies.
+ *
+ *   If d_skip is large, potentially spanning multiple d_nents then some
+ *   integer arithmetic to adjust d_sgl may improve performance. For example
+ *   if d_sgl is built using sgl_alloc_order(chainable=false) then the sgl
+ *   will be an array with equally sized segments facilitating that
+ *   arithmetic. The suggestion applies to s_skip, s_sgl and s_nents as well.
+ *
+ **/
+size_t sgl_copy_sgl(struct scatterlist *d_sgl, unsigned int d_nents, off_t d_skip,
+		    struct scatterlist *s_sgl, unsigned int s_nents, off_t s_skip,
+		    size_t n_bytes)
+{
+	size_t len;
+	size_t offset = 0;
+	struct sg_mapping_iter d_iter, s_iter;
+
+	if (n_bytes == 0)
+		return 0;
+	sg_miter_start(&s_iter, s_sgl, s_nents, SG_MITER_ATOMIC | SG_MITER_FROM_SG);
+	sg_miter_start(&d_iter, d_sgl, d_nents, SG_MITER_ATOMIC | SG_MITER_TO_SG);
+	if (!sg_miter_skip(&s_iter, s_skip))
+		goto fini;
+	if (!sg_miter_skip(&d_iter, d_skip))
+		goto fini;
+
+	while (offset < n_bytes) {
+		if (!sg_miter_next(&s_iter))
+			break;
+		if (!sg_miter_next(&d_iter))
+			break;
+		len = min3(d_iter.length, s_iter.length, n_bytes - offset);
+
+		memcpy(d_iter.addr, s_iter.addr, len);
+		offset += len;
+		/* LIFO order (stop d_iter before s_iter) needed with SG_MITER_ATOMIC */
+		d_iter.consumed = len;
+		sg_miter_stop(&d_iter);
+		s_iter.consumed = len;
+		sg_miter_stop(&s_iter);
+	}
+fini:
+	sg_miter_stop(&d_iter);
+	sg_miter_stop(&s_iter);
+	return offset;
+}
+EXPORT_SYMBOL(sgl_copy_sgl);
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 3/5] scatterlist: add sgl_equal_sgl() function
  2022-11-12 19:49 [PATCH v2 0/5] scatterlist: add operations for scsi_debug Douglas Gilbert
  2022-11-12 19:49 ` [PATCH v2 1/5] sgl_alloc_order: remove 4 GiB limit Douglas Gilbert
  2022-11-12 19:49 ` [PATCH v2 2/5] scatterlist: add sgl_copy_sgl() function Douglas Gilbert
@ 2022-11-12 19:49 ` Douglas Gilbert
  2022-11-12 19:49 ` [PATCH v2 4/5] scatterlist: add sgl_memset() Douglas Gilbert
  2022-11-12 19:49 ` [PATCH v2 5/5] scsi_debug: change store from vmalloc to sgl Douglas Gilbert
  4 siblings, 0 replies; 10+ messages in thread
From: Douglas Gilbert @ 2022-11-12 19:49 UTC (permalink / raw)
  To: linux-scsi
  Cc: martin.petersen, jejb, hare, bvanassche, bostroesser, jgg,
	David Disseldorp

After enabling copies between scatter gather lists (sgl_s), another
storage related operation is to compare two sgl_s for equality. This
new function is designed to partially implement NVMe's Compare
command and the SCSI VERIFY(BYTCHK=1) command. Like memcmp() this
function begins scanning at the start (of each sgl) and returns
false on the first miscompare and stops comparing.

The sgl_equal_sgl_idx() function additionally yields the index (i.e.
byte position) of the first miscompare. The additional parameter,
miscompare_idx, is a pointer. If it is non-NULL and a miscompare is
detected (i.e. the function returns false) then the byte index of
the first miscompare is written to *miscompare_idx. Knowing the
location of the first miscompare is needed to implement properly
the SCSI COMPARE AND WRITE command.

Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
Reviewed-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 include/linux/scatterlist.h |   8 +++
 lib/scatterlist.c           | 110 ++++++++++++++++++++++++++++++++++++
 2 files changed, 118 insertions(+)

diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index cea1edd246cb..e1552a3e9e13 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -449,6 +449,14 @@ size_t sgl_copy_sgl(struct scatterlist *d_sgl, unsigned int d_nents, off_t d_ski
 		    struct scatterlist *s_sgl, unsigned int s_nents, off_t s_skip,
 		    size_t n_bytes);
 
+bool sgl_equal_sgl(struct scatterlist *x_sgl, unsigned int x_nents, off_t x_skip,
+		   struct scatterlist *y_sgl, unsigned int y_nents, off_t y_skip,
+		   size_t n_bytes);
+
+bool sgl_equal_sgl_idx(struct scatterlist *x_sgl, unsigned int x_nents, off_t x_skip,
+		       struct scatterlist *y_sgl, unsigned int y_nents, off_t y_skip,
+		       size_t n_bytes, size_t *miscompare_idx);
+
 /*
  * Maximum number of entries that will be allocated in one piece, if
  * a list larger than this is required then chaining will be utilized.
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index 67a3cd04262b..6b3f1931601d 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -1164,3 +1164,113 @@ size_t sgl_copy_sgl(struct scatterlist *d_sgl, unsigned int d_nents, off_t d_ski
 	return offset;
 }
 EXPORT_SYMBOL(sgl_copy_sgl);
+
+/**
+ * sgl_equal_sgl_idx - check if x and y (both sgl_s) compare equal, report
+ *		       index for first unequal bytes
+ * @x_sgl:		 x (left) sgl
+ * @x_nents:		 Number of SG entries in x (left) sgl
+ * @x_skip:		 Number of bytes to skip in x (left) before starting
+ * @y_sgl:		 y (right) sgl
+ * @y_nents:		 Number of SG entries in y (right) sgl
+ * @y_skip:		 Number of bytes to skip in y (right) before starting
+ * @n_bytes:		 The (maximum) number of bytes to compare
+ * @miscompare_idx:	 if return is false, index of first miscompare written
+ *			 to this pointer (if non-NULL). Value will be < n_bytes
+ *
+ * Returns:
+ *   true if x and y compare equal before x, y or n_bytes is exhausted.
+ *   Otherwise on a miscompare, returns false (and stops comparing). If return
+ *   is false and miscompare_idx is non-NULL, then index of first miscompared
+ *   byte written to *miscompare_idx.
+ *
+ * Notes:
+ *   x and y are symmetrical: they can be swapped and the result is the same.
+ *
+ *   Implementation is based on memcmp(). x and y segments may overlap.
+ *
+ *   The notes in sgl_copy_sgl() about large sgl_s _applies here as well.
+ *
+ **/
+bool sgl_equal_sgl_idx(struct scatterlist *x_sgl, unsigned int x_nents, off_t x_skip,
+		       struct scatterlist *y_sgl, unsigned int y_nents, off_t y_skip,
+		       size_t n_bytes, size_t *miscompare_idx)
+{
+	bool equ = true;
+	size_t len;
+	size_t offset = 0;
+	struct sg_mapping_iter x_iter, y_iter;
+
+	if (n_bytes == 0)
+		return true;
+	sg_miter_start(&x_iter, x_sgl, x_nents, SG_MITER_ATOMIC | SG_MITER_FROM_SG);
+	sg_miter_start(&y_iter, y_sgl, y_nents, SG_MITER_ATOMIC | SG_MITER_FROM_SG);
+	if (!sg_miter_skip(&x_iter, x_skip))
+		goto fini;
+	if (!sg_miter_skip(&y_iter, y_skip))
+		goto fini;
+
+	while (offset < n_bytes) {
+		if (!sg_miter_next(&x_iter))
+			break;
+		if (!sg_miter_next(&y_iter))
+			break;
+		len = min3(x_iter.length, y_iter.length, n_bytes - offset);
+
+		equ = !memcmp(x_iter.addr, y_iter.addr, len);
+		if (!equ)
+			goto fini;
+		offset += len;
+		/* LIFO order is important when SG_MITER_ATOMIC is used */
+		y_iter.consumed = len;
+		sg_miter_stop(&y_iter);
+		x_iter.consumed = len;
+		sg_miter_stop(&x_iter);
+	}
+fini:
+	if (miscompare_idx && !equ) {
+		u8 *xp = x_iter.addr;
+		u8 *yp = y_iter.addr;
+		u8 *x_endp;
+
+		for (x_endp = xp + len ; xp < x_endp; ++xp, ++yp) {
+			if (*xp != *yp)
+				break;
+		}
+		*miscompare_idx = offset + len - (x_endp - xp);
+	}
+	sg_miter_stop(&y_iter);
+	sg_miter_stop(&x_iter);
+	return equ;
+}
+EXPORT_SYMBOL(sgl_equal_sgl_idx);
+
+/**
+ * sgl_equal_sgl - check if x and y (both sgl_s) compare equal
+ * @x_sgl:		 x (left) sgl
+ * @x_nents:		 Number of SG entries in x (left) sgl
+ * @x_skip:		 Number of bytes to skip in x (left) before starting
+ * @y_sgl:		 y (right) sgl
+ * @y_nents:		 Number of SG entries in y (right) sgl
+ * @y_skip:		 Number of bytes to skip in y (right) before starting
+ * @n_bytes:		 The (maximum) number of bytes to compare
+ *
+ * Returns:
+ *   true if x and y compare equal before x, y or n_bytes is exhausted.
+ *   Otherwise on a miscompare, returns false (and stops comparing).
+ *
+ * Notes:
+ *   x and y are symmetrical: they can be swapped and the result is the same.
+ *
+ *   Implementation is based on memcmp(). x and y segments may overlap.
+ *
+ *   The notes in sgl_copy_sgl() about large sgl_s _applies here as well.
+ *
+ **/
+bool sgl_equal_sgl(struct scatterlist *x_sgl, unsigned int x_nents, off_t x_skip,
+		   struct scatterlist *y_sgl, unsigned int y_nents, off_t y_skip,
+		   size_t n_bytes)
+{
+	return sgl_equal_sgl_idx(x_sgl, x_nents, x_skip, y_sgl, y_nents, y_skip, n_bytes, NULL);
+}
+EXPORT_SYMBOL(sgl_equal_sgl);
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 4/5] scatterlist: add sgl_memset()
  2022-11-12 19:49 [PATCH v2 0/5] scatterlist: add operations for scsi_debug Douglas Gilbert
                   ` (2 preceding siblings ...)
  2022-11-12 19:49 ` [PATCH v2 3/5] scatterlist: add sgl_equal_sgl() function Douglas Gilbert
@ 2022-11-12 19:49 ` Douglas Gilbert
  2022-11-12 19:49 ` [PATCH v2 5/5] scsi_debug: change store from vmalloc to sgl Douglas Gilbert
  4 siblings, 0 replies; 10+ messages in thread
From: Douglas Gilbert @ 2022-11-12 19:49 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare, bvanassche, bostroesser, jgg

The existing sg_zero_buffer() function is a bit restrictive. For
example protection information (PI) blocks are usually initialized
to 0xff bytes. As its name suggests sgl_memset() is modelled on
memset(). One difference is the type of the val argument which is
u8 rather than int. Plus it returns the number of bytes (over)written.

Change implementation of sg_zero_buffer() to call this new function.

Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 include/linux/scatterlist.h | 20 +++++++++-
 lib/scatterlist.c           | 78 ++++++++++++++++++++-----------------
 2 files changed, 61 insertions(+), 37 deletions(-)

diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index e1552a3e9e13..dbcf0f6fd8d9 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -442,8 +442,6 @@ size_t sg_pcopy_from_buffer(struct scatterlist *sgl, unsigned int nents,
 			    const void *buf, size_t buflen, off_t skip);
 size_t sg_pcopy_to_buffer(struct scatterlist *sgl, unsigned int nents,
 			  void *buf, size_t buflen, off_t skip);
-size_t sg_zero_buffer(struct scatterlist *sgl, unsigned int nents,
-		       size_t buflen, off_t skip);
 
 size_t sgl_copy_sgl(struct scatterlist *d_sgl, unsigned int d_nents, off_t d_skip,
 		    struct scatterlist *s_sgl, unsigned int s_nents, off_t s_skip,
@@ -457,6 +455,24 @@ bool sgl_equal_sgl_idx(struct scatterlist *x_sgl, unsigned int x_nents, off_t x_
 		       struct scatterlist *y_sgl, unsigned int y_nents, off_t y_skip,
 		       size_t n_bytes, size_t *miscompare_idx);
 
+size_t sgl_memset(struct scatterlist *sgl, unsigned int nents, off_t skip,
+		  u8 val, size_t n_bytes);
+
+/**
+ * sg_zero_buffer - Zero-out a part of a SG list
+ * @sgl:		The SG list
+ * @nents:		Number of SG entries
+ * @buflen:		The number of bytes to zero out
+ * @skip:		Number of bytes to skip before zeroing
+ *
+ * Returns the number of bytes zeroed.
+ **/
+static inline size_t sg_zero_buffer(struct scatterlist *sgl, unsigned int nents,
+				    size_t buflen, off_t skip)
+{
+	return sgl_memset(sgl, nents, skip, 0, buflen);
+}
+
 /*
  * Maximum number of entries that will be allocated in one piece, if
  * a list larger than this is required then chaining will be utilized.
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index 6b3f1931601d..e1e729ee758f 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -1056,41 +1056,6 @@ size_t sg_pcopy_to_buffer(struct scatterlist *sgl, unsigned int nents,
 }
 EXPORT_SYMBOL(sg_pcopy_to_buffer);
 
-/**
- * sg_zero_buffer - Zero-out a part of a SG list
- * @sgl:		 The SG list
- * @nents:		 Number of SG entries
- * @buflen:		 The number of bytes to zero out
- * @skip:		 Number of bytes to skip before zeroing
- *
- * Returns the number of bytes zeroed.
- **/
-size_t sg_zero_buffer(struct scatterlist *sgl, unsigned int nents,
-		       size_t buflen, off_t skip)
-{
-	unsigned int offset = 0;
-	struct sg_mapping_iter miter;
-	unsigned int sg_flags = SG_MITER_ATOMIC | SG_MITER_TO_SG;
-
-	sg_miter_start(&miter, sgl, nents, sg_flags);
-
-	if (!sg_miter_skip(&miter, skip))
-		return false;
-
-	while (offset < buflen && sg_miter_next(&miter)) {
-		unsigned int len;
-
-		len = min(miter.length, buflen - offset);
-		memset(miter.addr, 0, len);
-
-		offset += len;
-	}
-
-	sg_miter_stop(&miter);
-	return offset;
-}
-EXPORT_SYMBOL(sg_zero_buffer);
-
 /**
  * sgl_copy_sgl - Copy over a destination sgl from a source sgl
  * @d_sgl:		 Destination sgl
@@ -1274,3 +1239,46 @@ bool sgl_equal_sgl(struct scatterlist *x_sgl, unsigned int x_nents, off_t x_skip
 	return sgl_equal_sgl_idx(x_sgl, x_nents, x_skip, y_sgl, y_nents, y_skip, n_bytes, NULL);
 }
 EXPORT_SYMBOL(sgl_equal_sgl);
+
+/**
+ * sgl_memset - set byte 'val' up to n_bytes times on SG list
+ * @sgl:		 The SG list
+ * @nents:		 Number of SG entries in sgl
+ * @skip:		 Number of bytes to skip before starting
+ * @val:		 byte value to write to sgl
+ * @n_bytes:		 The (maximum) number of bytes to modify
+ *
+ * Returns:
+ *   The number of bytes written.
+ *
+ * Notes:
+ *   Stops writing if either sgl or n_bytes is exhausted. If n_bytes is
+ *   set SIZE_MAX then val will be written to each byte until the end
+ *   of sgl.
+ *
+ *   The notes in sgl_copy_sgl() about large sgl_s _applies here as well.
+ *
+ **/
+size_t sgl_memset(struct scatterlist *sgl, unsigned int nents, off_t skip,
+		  u8 val, size_t n_bytes)
+{
+	size_t offset = 0;
+	size_t len;
+	struct sg_mapping_iter miter;
+
+	if (n_bytes == 0)
+		return 0;
+	sg_miter_start(&miter, sgl, nents, SG_MITER_ATOMIC | SG_MITER_TO_SG);
+	if (!sg_miter_skip(&miter, skip))
+		goto fini;
+
+	while ((offset < n_bytes) && sg_miter_next(&miter)) {
+		len = min(miter.length, n_bytes - offset);
+		memset(miter.addr, val, len);
+		offset += len;
+	}
+fini:
+	sg_miter_stop(&miter);
+	return offset;
+}
+EXPORT_SYMBOL(sgl_memset);
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 5/5] scsi_debug: change store from vmalloc to sgl
  2022-11-12 19:49 [PATCH v2 0/5] scatterlist: add operations for scsi_debug Douglas Gilbert
                   ` (3 preceding siblings ...)
  2022-11-12 19:49 ` [PATCH v2 4/5] scatterlist: add sgl_memset() Douglas Gilbert
@ 2022-11-12 19:49 ` Douglas Gilbert
  4 siblings, 0 replies; 10+ messages in thread
From: Douglas Gilbert @ 2022-11-12 19:49 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare, bvanassche, bostroesser, jgg

A long time ago this driver's store was allocated by kmalloc() or
alloc_pages(). When this was switched to vmalloc() the author
noticed slower ramdisk access times and more variability in repeated
tests. So try going back with sgl_alloc_order() to get uniformly
sized allocations in a sometimes large scatter gather _array_. That
array is the basis of maintaining O(1) access to the store.

Using sgl_alloc_order() and friends requires CONFIG_SGL_ALLOC
so add a 'select' to the Kconfig file.

Remove kcalloc() in resp_verify() as sgl_s can now be compared
directly without forming an intermediate buffer. This is a
performance win for the SCSI VERIFY command implementation.

Make the SCSI COMPARE AND WRITE command yield the offset of the
first miscompared byte when the compare fails (as required by
T10).

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/Kconfig      |   3 +-
 drivers/scsi/scsi_debug.c | 442 ++++++++++++++++++++++++++------------
 2 files changed, 305 insertions(+), 140 deletions(-)

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index 03e71e3d5e5b..97edb4e17319 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -1229,13 +1229,14 @@ config SCSI_DEBUG
 	tristate "SCSI debugging host and device simulator"
 	depends on SCSI
 	select CRC_T10DIF
+	select SGL_ALLOC
 	help
 	  This pseudo driver simulates one or more hosts (SCSI initiators),
 	  each with one or more targets, each with one or more logical units.
 	  Defaults to one of each, creating a small RAM disk device. Many
 	  parameters found in the /sys/bus/pseudo/drivers/scsi_debug
 	  directory can be tweaked at run time.
-	  See <http://sg.danny.cz/sg/sdebug26.html> for more information.
+	  See <https://sg.danny.cz/sg/scsi_debug.html> for more information.
 	  Mainly used for testing and best as a module. If unsure, say N.
 
 config SCSI_MESH
diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 697fc57bc711..24b0bcc2affd 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -221,6 +221,7 @@ static const char *sdebug_version_date = "20210520";
 #define SDEBUG_CANQUEUE_WORDS  3	/* a WORD is bits in a long */
 #define SDEBUG_CANQUEUE  (SDEBUG_CANQUEUE_WORDS * BITS_PER_LONG)
 #define DEF_CMD_PER_LUN  SDEBUG_CANQUEUE
+#define SDEB_ORDER_TOO_LARGE 4096
 
 /* UA - Unit Attention; SA - Service Action; SSU - Start Stop Unit */
 #define F_D_IN			1	/* Data-in command (e.g. READ) */
@@ -318,8 +319,11 @@ struct sdebug_host_info {
 
 /* There is an xarray of pointers to this struct's objects, one per host */
 struct sdeb_store_info {
+	unsigned int n_elem;    /* number of sgl elements */
+	unsigned int order;	/* as used by alloc_pages() */
+	unsigned int elem_pow2;	/* PAGE_SHIFT + order */
 	rwlock_t macc_lck;	/* for atomic media access on this store */
-	u8 *storep;		/* user data storage (ram) */
+	struct scatterlist *sgl;  /* main store: n_elem array of same sized allocs */
 	struct t10_pi_tuple *dif_storep; /* protection info */
 	void *map_storep;	/* provisioning map */
 };
@@ -880,19 +884,6 @@ static inline bool scsi_debug_lbp(void)
 		(sdebug_lbpu || sdebug_lbpws || sdebug_lbpws10);
 }
 
-static void *lba2fake_store(struct sdeb_store_info *sip,
-			    unsigned long long lba)
-{
-	struct sdeb_store_info *lsip = sip;
-
-	lba = do_div(lba, sdebug_store_sectors);
-	if (!sip || !sip->storep) {
-		WARN_ON_ONCE(true);
-		lsip = xa_load(per_store_ap, 0);  /* should never be NULL */
-	}
-	return lsip->storep + lba * sdebug_sector_size;
-}
-
 static struct t10_pi_tuple *dif_store(struct sdeb_store_info *sip,
 				      sector_t sector)
 {
@@ -1001,7 +992,6 @@ static int scsi_debug_ioctl(struct scsi_device *dev, unsigned int cmd,
 				    __func__, cmd);
 	}
 	return -EINVAL;
-	/* return -ENOTTY; // correct return but upsets fdisk */
 }
 
 static void config_cdb_len(struct scsi_device *sdev)
@@ -1221,6 +1211,53 @@ static int fetch_to_dev_buffer(struct scsi_cmnd *scp, unsigned char *arr,
 	return scsi_sg_copy_to_buffer(scp, arr, arr_len);
 }
 
+static bool sdeb_sgl_cmp_buf(struct scatterlist *sgl, unsigned int nents,
+			     const void *buf, size_t buflen, off_t skip)
+{
+	bool equ = true;
+	size_t offset = 0;
+	size_t len;
+	struct sg_mapping_iter miter;
+
+	if (buflen == 0)
+		return true;
+	sg_miter_start(&miter, sgl, nents, SG_MITER_ATOMIC | SG_MITER_FROM_SG);
+	if (!sg_miter_skip(&miter, skip))
+		goto fini;
+
+	while (equ && (offset < buflen) && sg_miter_next(&miter)) {
+		len = min(miter.length, buflen - offset);
+		equ = memcmp(buf + offset, miter.addr, len) == 0;
+		offset += len;
+	}
+fini:
+	sg_miter_stop(&miter);
+	return equ;
+}
+
+static void sdeb_sgl_prefetch(struct scatterlist *sgl, unsigned int nents,
+			      off_t skip, size_t n_bytes)
+{
+	size_t offset = 0;
+	size_t len;
+	struct sg_mapping_iter miter;
+	unsigned int sg_flags = SG_MITER_FROM_SG;
+
+	if (n_bytes == 0)
+		return;
+	sg_miter_start(&miter, sgl, nents, sg_flags);
+	if (!sg_miter_skip(&miter, skip))
+		goto fini;
+
+	while ((offset < n_bytes) && sg_miter_next(&miter)) {
+		len = min(miter.length, n_bytes - offset);
+		prefetch_range(miter.addr, len);
+		offset += len;
+	}
+	fini:
+	sg_miter_stop(&miter);
+}
+
 
 static char sdebug_inq_vendor_id[9] = "Linux   ";
 static char sdebug_inq_product_id[17] = "scsi_debug      ";
@@ -3008,13 +3045,14 @@ static inline struct sdeb_store_info *devip2sip(struct sdebug_dev_info *devip,
 
 /* Returns number of bytes copied or -1 if error. */
 static int do_device_access(struct sdeb_store_info *sip, struct scsi_cmnd *scp,
-			    u32 sg_skip, u64 lba, u32 num, bool do_write)
+			    u32 data_inout_off, u64 lba, u32 n_blks, bool do_write)
 {
 	int ret;
-	u64 block, rest = 0;
+	u32 lb_size = sdebug_sector_size;
+	u64 block, sgl_i, rem, lba_start, rest = 0;
 	enum dma_data_direction dir;
 	struct scsi_data_buffer *sdb = &scp->sdb;
-	u8 *fsp;
+	struct scatterlist *store_sgl;
 
 	if (do_write) {
 		dir = DMA_TO_DEVICE;
@@ -3027,25 +3065,38 @@ static int do_device_access(struct sdeb_store_info *sip, struct scsi_cmnd *scp,
 		return 0;
 	if (scp->sc_data_direction != dir)
 		return -1;
-	fsp = sip->storep;
-
 	block = do_div(lba, sdebug_store_sectors);
-	if (block + num > sdebug_store_sectors)
-		rest = block + num - sdebug_store_sectors;
+	if (block + n_blks > sdebug_store_sectors)
+		rest = block + n_blks - sdebug_store_sectors;
+	lba_start = block * lb_size;
+	sgl_i = lba_start >> sip->elem_pow2;
+	rem = lba_start - (sgl_i ? (sgl_i << sip->elem_pow2) : 0);
+	store_sgl = sip->sgl + sgl_i;	/* O(1) to each store sg element */
+
+	if (do_write)
+		ret = sgl_copy_sgl(store_sgl, sip->n_elem - sgl_i, rem,
+				   sdb->table.sgl, sdb->table.nents, data_inout_off,
+				   (n_blks - rest) * lb_size);
+	else
+		ret = sgl_copy_sgl(sdb->table.sgl, sdb->table.nents, data_inout_off,
+				   store_sgl, sip->n_elem - sgl_i, rem,
+				   (n_blks - rest) * lb_size);
 
-	ret = sg_copy_buffer(sdb->table.sgl, sdb->table.nents,
-		   fsp + (block * sdebug_sector_size),
-		   (num - rest) * sdebug_sector_size, sg_skip, do_write);
-	if (ret != (num - rest) * sdebug_sector_size)
+	if (ret != (n_blks - rest) * lb_size)
 		return ret;
 
-	if (rest) {
-		ret += sg_copy_buffer(sdb->table.sgl, sdb->table.nents,
-			    fsp, rest * sdebug_sector_size,
-			    sg_skip + ((num - rest) * sdebug_sector_size),
-			    do_write);
-	}
-
+	if (rest == 0)
+		goto fini;
+	if (do_write)
+		ret += sgl_copy_sgl(sip->sgl, sip->n_elem, 0, sdb->table.sgl,
+				    sdb->table.nents,
+				    data_inout_off + ((n_blks - rest) * lb_size),
+				    rest * lb_size);
+	else
+		ret += sgl_copy_sgl(sdb->table.sgl, sdb->table.nents,
+				    data_inout_off + ((n_blks - rest) * lb_size),
+				    sip->sgl, sip->n_elem, 0, rest * lb_size);
+fini:
 	return ret;
 }
 
@@ -3062,37 +3113,66 @@ static int do_dout_fetch(struct scsi_cmnd *scp, u32 num, u8 *doutp)
 			      num * sdebug_sector_size, 0, true);
 }
 
-/* If sip->storep+lba compares equal to arr(num), then copy top half of
- * arr into sip->storep+lba and return true. If comparison fails then
- * return false. */
+/* If sip->storep+lba compares equal to arr(num) or scp->sdb, then if miscomp_idxp is non-NULL,
+ * copy top half of arr into sip->storep+lba and return true. If comparison fails then return
+ * false and write the miscompare_idx via miscomp_idxp. This is the COMAPARE AND WRITE case.
+ * For VERIFY(BytChk=1), set arr to NULL which causes a sgl (store) to sgl (data-out buffer)
+ * compare to be done. VERIFY(BytChk=3) sets arr to a valid address and sets miscomp_idxp
+ * to NULL.
+ */
 static bool comp_write_worker(struct sdeb_store_info *sip, u64 lba, u32 num,
-			      const u8 *arr, bool compare_only)
+			      const u8 *arr, struct scsi_cmnd *scp, size_t *miscomp_idxp)
 {
-	bool res;
-	u64 block, rest = 0;
+	bool equ;
+	u64 block, lba_start, sgl_i, rem, rest = 0;
 	u32 store_blks = sdebug_store_sectors;
-	u32 lb_size = sdebug_sector_size;
-	u8 *fsp = sip->storep;
+	const u32 lb_size = sdebug_sector_size;
+	u32 top_half = num * lb_size;
+	struct scsi_data_buffer *sdb = &scp->sdb;
+	struct scatterlist *store_sgl;
 
 	block = do_div(lba, store_blks);
 	if (block + num > store_blks)
 		rest = block + num - store_blks;
-
-	res = !memcmp(fsp + (block * lb_size), arr, (num - rest) * lb_size);
-	if (!res)
-		return res;
-	if (rest)
-		res = memcmp(fsp, arr + ((num - rest) * lb_size),
+	lba_start = block * lb_size;
+	sgl_i = lba_start >> sip->elem_pow2;
+	rem = lba_start - (sgl_i ? (sgl_i << sip->elem_pow2) : 0);
+	store_sgl = sip->sgl + sgl_i;	/* O(1) to each store sg element */
+
+	if (!arr) {	/* sgl to sgl compare */
+		equ = sgl_equal_sgl_idx(store_sgl, sip->n_elem - sgl_i, rem,
+					sdb->table.sgl, sdb->table.nents, 0,
+					(num - rest) * lb_size, miscomp_idxp);
+		if (!equ)
+			return equ;
+		if (rest > 0)
+			equ = sgl_equal_sgl_idx(sip->sgl, sip->n_elem, 0, sdb->table.sgl,
+						sdb->table.nents, (num - rest) * lb_size,
+						rest * lb_size, miscomp_idxp);
+	} else {
+		equ = sdeb_sgl_cmp_buf(store_sgl, sip->n_elem - sgl_i, arr,
+				       (num - rest) * lb_size, 0);
+		if (!equ)
+			return equ;
+		if (rest > 0)
+			equ = sdeb_sgl_cmp_buf(sip->sgl, sip->n_elem, arr,
+					       (num - rest) * lb_size, 0);
+	}
+	if (!equ || !miscomp_idxp)
+		return equ;
+
+	/* Copy "top half" of dout (args: 4, 5 and 6) into store sgl (args 1, 2 and 3) */
+	sgl_copy_sgl(store_sgl, sip->n_elem - sgl_i, rem,
+		     sdb->table.sgl, sdb->table.nents, top_half,
+		     (num - rest) * lb_size);
+	if (rest > 0) {	/* for virtual_gb need to handle wrap-around of store */
+		u32 src_off =  top_half + ((num - rest) * lb_size);
+
+		sgl_copy_sgl(sip->sgl, sip->n_elem, 0,
+			     sdb->table.sgl, sdb->table.nents, src_off,
 			     rest * lb_size);
-	if (!res)
-		return res;
-	if (compare_only)
-		return true;
-	arr += num * lb_size;
-	memcpy(fsp + (block * lb_size), arr, (num - rest) * lb_size);
-	if (rest)
-		memcpy(fsp, arr + ((num - rest) * lb_size), rest * lb_size);
-	return res;
+	}
+	return true;
 }
 
 static __be16 dif_compute_csum(const void *buf, int len)
@@ -3185,13 +3265,22 @@ static int prot_verify_read(struct scsi_cmnd *scp, sector_t start_sec,
 {
 	int ret = 0;
 	unsigned int i;
+	const u32 lb_size = sdebug_sector_size;
 	sector_t sector;
+	u64 lba, lba_start, block, rem, sgl_i;
 	struct sdeb_store_info *sip = devip2sip((struct sdebug_dev_info *)
 						scp->device->hostdata, true);
 	struct t10_pi_tuple *sdt;
+	struct scatterlist *store_sgl;
+	u8 *arr;
+
+	arr = kzalloc(lb_size, GFP_ATOMIC);
+	if (!arr)
+		return -1;	/* mkp, is this correct? */
 
 	for (i = 0; i < sectors; i++, ei_lba++) {
 		sector = start_sec + i;
+		lba = sector;
 		sdt = dif_store(sip, sector);
 
 		if (sdt->app_tag == cpu_to_be16(0xffff))
@@ -3205,11 +3294,19 @@ static int prot_verify_read(struct scsi_cmnd *scp, sector_t start_sec,
 		 * have to iterate over the PI twice.
 		 */
 		if (scp->cmnd[1] >> 5) { /* RDPROTECT */
-			ret = dif_verify(sdt, lba2fake_store(sip, sector),
-					 sector, ei_lba);
+			block = do_div(lba, sdebug_store_sectors);
+			lba_start = block * lb_size;
+			sgl_i = lba_start >> sip->elem_pow2;
+			rem = lba_start - (sgl_i ? (sgl_i << sip->elem_pow2) : 0);
+			store_sgl = sip->sgl + sgl_i;
+
+			ret = sg_copy_buffer(store_sgl, sip->n_elem - sgl_i, arr, lb_size, rem, true);
+
+			ret = dif_verify(sdt, arr, sector, ei_lba);
+
 			if (ret) {
 				dif_errors++;
-				break;
+				goto fini;
 			}
 		}
 	}
@@ -3217,6 +3314,8 @@ static int prot_verify_read(struct scsi_cmnd *scp, sector_t start_sec,
 	dif_copy_prot(scp, start_sec, sectors, true);
 	dix_reads++;
 
+fini:
+	kfree(arr);
 	return ret;
 }
 
@@ -3431,6 +3530,7 @@ static int prot_verify_write(struct scsi_cmnd *SCpnt, sector_t start_sec,
 			     unsigned int sectors, u32 ei_lba)
 {
 	int ret;
+	const u32 lb_size = sdebug_sector_size;
 	struct t10_pi_tuple *sdt;
 	void *daddr;
 	sector_t sector = start_sec;
@@ -3480,7 +3580,7 @@ static int prot_verify_write(struct scsi_cmnd *SCpnt, sector_t start_sec,
 
 			sector++;
 			ei_lba++;
-			dpage_offset += sdebug_sector_size;
+			dpage_offset += lb_size;
 		}
 		diter.consumed = dpage_offset;
 		sg_miter_stop(&diter);
@@ -3555,8 +3655,8 @@ static void map_region(struct sdeb_store_info *sip, sector_t lba,
 static void unmap_region(struct sdeb_store_info *sip, sector_t lba,
 			 unsigned int len)
 {
+	const u32 lb_size = sdebug_sector_size;
 	sector_t end = lba + len;
-	u8 *fsp = sip->storep;
 
 	while (lba < end) {
 		unsigned long index = lba_to_map_index(lba);
@@ -3566,10 +3666,26 @@ static void unmap_region(struct sdeb_store_info *sip, sector_t lba,
 		    index < map_size) {
 			clear_bit(index, sip->map_storep);
 			if (sdebug_lbprz) {  /* for LBPRZ=2 return 0xff_s */
-				memset(fsp + lba * sdebug_sector_size,
-				       (sdebug_lbprz & 1) ? 0 : 0xff,
-				       sdebug_sector_size *
-				       sdebug_unmap_granularity);
+				int val = (sdebug_lbprz & 1) ? 0 : 0xff;
+				u32 num = sdebug_unmap_granularity;
+				u64 lbaa = lba;
+				u64 rest = 0;
+				u64 block, lba_start, sgl_i, rem;
+				struct scatterlist *store_sgl;
+
+				block = do_div(lbaa, sdebug_store_sectors);
+				if (block + num > sdebug_store_sectors)
+					rest = block + num - sdebug_store_sectors;
+				lba_start = block * lb_size;
+				sgl_i = lba_start >> sip->elem_pow2;
+				rem = lba_start - (sgl_i ? (sgl_i << sip->elem_pow2) : 0);
+				store_sgl = sip->sgl + sgl_i;
+
+				sgl_memset(store_sgl, sip->n_elem - sgl_i, rem, val,
+					   num * lb_size);
+				if (rest)
+					sgl_memset(sip->sgl, sip->n_elem, rem, val,
+						   (num - rest) * lb_size);
 			}
 			if (sip->dif_storep) {
 				memset(sip->dif_storep + lba, 0xff,
@@ -3727,7 +3843,7 @@ static int resp_write_scat(struct scsi_cmnd *scp,
 	u8 wrprotect;
 	u16 lbdof, num_lrd, k;
 	u32 num, num_by, bt_len, lbdof_blen, sg_off, cum_lb;
-	u32 lb_size = sdebug_sector_size;
+	const u32 lb_size = sdebug_sector_size;
 	u32 ei_lba;
 	u64 lba;
 	int ret, res;
@@ -3885,13 +4001,12 @@ static int resp_write_same(struct scsi_cmnd *scp, u64 lba, u32 num,
 	struct scsi_device *sdp = scp->device;
 	struct sdebug_dev_info *devip = (struct sdebug_dev_info *)sdp->hostdata;
 	unsigned long long i;
-	u64 block, lbaa;
-	u32 lb_size = sdebug_sector_size;
+	u64 block, lbaa, sgl_i, lba_start, rem;
+	const u32 lb_size = sdebug_sector_size;
 	int ret;
 	struct sdeb_store_info *sip = devip2sip((struct sdebug_dev_info *)
 						scp->device->hostdata, true);
-	u8 *fs1p;
-	u8 *fsp;
+	struct scatterlist *store_sgl;
 
 	sdeb_write_lock(sip);
 
@@ -3907,15 +4022,17 @@ static int resp_write_same(struct scsi_cmnd *scp, u64 lba, u32 num,
 	}
 	lbaa = lba;
 	block = do_div(lbaa, sdebug_store_sectors);
-	/* if ndob then zero 1 logical block, else fetch 1 logical block */
-	fsp = sip->storep;
-	fs1p = fsp + (block * lb_size);
-	if (ndob) {
-		memset(fs1p, 0, lb_size);
-		ret = 0;
-	} else
-		ret = fetch_to_dev_buffer(scp, fs1p, lb_size);
 
+	/* if ndob then zero 1 logical block, else fetch 1 logical block */
+	lba_start = block * lb_size;
+	sgl_i = lba_start >> sip->elem_pow2;
+	rem = lba_start - (sgl_i ? (sgl_i << sip->elem_pow2) : 0);
+	store_sgl = sip->sgl + sgl_i;
+	ret = 0;
+	if (ndob)
+		sgl_memset(store_sgl, sip->n_elem - sgl_i, rem, 0, lb_size);
+	else
+		ret = do_device_access(sip, scp, 0, lba, 1, true);
 	if (-1 == ret) {
 		sdeb_write_unlock(sip);
 		return DID_ERROR << 16;
@@ -3926,9 +4043,11 @@ static int resp_write_same(struct scsi_cmnd *scp, u64 lba, u32 num,
 
 	/* Copy first sector to remaining blocks */
 	for (i = 1 ; i < num ; i++) {
-		lbaa = lba + i;
-		block = do_div(lbaa, sdebug_store_sectors);
-		memmove(fsp + (block * lb_size), fs1p, lb_size);
+		ret = do_device_access(sip, scp, 0, lba + i, 1, true);
+		if (-1 == ret) {
+			write_unlock(&sip->macc_lck);
+			return DID_ERROR << 16;
+		}
 	}
 	if (scsi_debug_lbp())
 		map_region(sip, lba, num);
@@ -3937,7 +4056,6 @@ static int resp_write_same(struct scsi_cmnd *scp, u64 lba, u32 num,
 		zbc_inc_wp(devip, lba, num);
 out:
 	sdeb_write_unlock(sip);
-
 	return 0;
 }
 
@@ -4043,15 +4161,14 @@ static int resp_write_buffer(struct scsi_cmnd *scp,
 	return 0;
 }
 
-static int resp_comp_write(struct scsi_cmnd *scp,
-			   struct sdebug_dev_info *devip)
+static int resp_comp_write(struct scsi_cmnd *scp, struct sdebug_dev_info *devip)
 {
 	u8 *cmd = scp->cmnd;
-	u8 *arr;
 	struct sdeb_store_info *sip = devip2sip(devip, true);
 	u64 lba;
+	size_t miscomp_idx;
 	u32 dnum;
-	u32 lb_size = sdebug_sector_size;
+	const u32 lb_size = sdebug_sector_size;
 	u8 num;
 	int ret;
 	int retval = 0;
@@ -4074,25 +4191,21 @@ static int resp_comp_write(struct scsi_cmnd *scp,
 	if (ret)
 		return ret;
 	dnum = 2 * num;
-	arr = kcalloc(lb_size, dnum, GFP_ATOMIC);
-	if (NULL == arr) {
-		mk_sense_buffer(scp, ILLEGAL_REQUEST, INSUFF_RES_ASC,
-				INSUFF_RES_ASCQ);
-		return check_condition_result;
-	}
 
 	sdeb_write_lock(sip);
-
-	ret = do_dout_fetch(scp, dnum, arr);
-	if (ret == -1) {
-		retval = DID_ERROR << 16;
+	if (scp->sdb.length < dnum * lb_size || scp->sc_data_direction != DMA_TO_DEVICE) {
+		mk_sense_buffer(scp, ILLEGAL_REQUEST, PARAMETER_LIST_LENGTH_ERR, 0);
+		retval = check_condition_result;
+		if (sdebug_verbose)
+			sdev_printk(KERN_INFO, scp->device,
+				    "%s::%s: cdb indicated=%u, IO sent=%d bytes\n", my_name,
+				    __func__, dnum * lb_size, ret);
 		goto cleanup;
-	} else if (sdebug_verbose && (ret < (dnum * lb_size)))
-		sdev_printk(KERN_INFO, scp->device, "%s: compare_write: cdb "
-			    "indicated=%u, IO sent=%d bytes\n", my_name,
-			    dnum * lb_size, ret);
-	if (!comp_write_worker(sip, lba, num, arr, false)) {
+	}
+
+	if (!comp_write_worker(sip, lba, num, NULL, scp, &miscomp_idx)) {
 		mk_sense_buffer(scp, MISCOMPARE, MISCOMPARE_VERIFY_ASC, 0);
+		scsi_set_sense_information(scp->sense_buffer, SCSI_SENSE_BUFFERSIZE, miscomp_idx);
 		retval = check_condition_result;
 		goto cleanup;
 	}
@@ -4100,7 +4213,6 @@ static int resp_comp_write(struct scsi_cmnd *scp,
 		map_region(sip, lba, num);
 cleanup:
 	sdeb_write_unlock(sip);
-	kfree(arr);
 	return retval;
 }
 
@@ -4246,12 +4358,12 @@ static int resp_pre_fetch(struct scsi_cmnd *scp,
 			  struct sdebug_dev_info *devip)
 {
 	int res = 0;
-	u64 lba;
-	u64 block, rest = 0;
+	const u32 lb_size = sdebug_sector_size;
+	u64 lba, block, sgl_i, rem, lba_start, rest = 0;
 	u32 nblks;
 	u8 *cmd = scp->cmnd;
 	struct sdeb_store_info *sip = devip2sip(devip, true);
-	u8 *fsp = sip->storep;
+	struct scatterlist *store_sgl;
 
 	if (cmd[0] == PRE_FETCH) {	/* 10 byte cdb */
 		lba = get_unaligned_be32(cmd + 2);
@@ -4264,21 +4376,21 @@ static int resp_pre_fetch(struct scsi_cmnd *scp,
 		mk_sense_buffer(scp, ILLEGAL_REQUEST, LBA_OUT_OF_RANGE, 0);
 		return check_condition_result;
 	}
-	if (!fsp)
-		goto fini;
 	/* PRE-FETCH spec says nothing about LBP or PI so skip them */
 	block = do_div(lba, sdebug_store_sectors);
 	if (block + nblks > sdebug_store_sectors)
 		rest = block + nblks - sdebug_store_sectors;
+	lba_start = block * lb_size;
+	sgl_i = lba_start >> sip->elem_pow2;
+	rem = lba_start - (sgl_i ? (sgl_i << sip->elem_pow2) : 0);
+	store_sgl = sip->sgl + sgl_i;	/* O(1) to each store sg element */
 
 	/* Try to bring the PRE-FETCH range into CPU's cache */
 	sdeb_read_lock(sip);
-	prefetch_range(fsp + (sdebug_sector_size * block),
-		       (nblks - rest) * sdebug_sector_size);
+	sdeb_sgl_prefetch(store_sgl, sip->n_elem - sgl_i, rem, (nblks - rest) * lb_size);
 	if (rest)
-		prefetch_range(fsp, rest * sdebug_sector_size);
+		sdeb_sgl_prefetch(sip->sgl, sip->n_elem, 0, rest * lb_size);
 	sdeb_read_unlock(sip);
-fini:
 	if (cmd[1] & 0x2)
 		res = SDEG_RES_IMMED_MASK;
 	return res | condition_met_result;
@@ -4395,7 +4507,7 @@ static int resp_verify(struct scsi_cmnd *scp, struct sdebug_dev_info *devip)
 	u32 vnum, a_num, off;
 	const u32 lb_size = sdebug_sector_size;
 	u64 lba;
-	u8 *arr;
+	u8 *arr = NULL;
 	u8 *cmd = scp->cmnd;
 	struct sdeb_store_info *sip = devip2sip(devip, true);
 
@@ -4429,30 +4541,31 @@ static int resp_verify(struct scsi_cmnd *scp, struct sdebug_dev_info *devip)
 	if (ret)
 		return ret;
 
-	arr = kcalloc(lb_size, vnum, GFP_ATOMIC);
-	if (!arr) {
-		mk_sense_buffer(scp, ILLEGAL_REQUEST, INSUFF_RES_ASC,
-				INSUFF_RES_ASCQ);
-		return check_condition_result;
+	if (is_bytchk3) {
+		arr = kcalloc(lb_size, vnum, GFP_ATOMIC);
+		if (!arr) {
+			mk_sense_buffer(scp, ILLEGAL_REQUEST, INSUFF_RES_ASC, INSUFF_RES_ASCQ);
+			return check_condition_result;
+		}
 	}
 	/* Not changing store, so only need read access */
 	sdeb_read_lock(sip);
 
-	ret = do_dout_fetch(scp, a_num, arr);
-	if (ret == -1) {
-		ret = DID_ERROR << 16;
-		goto cleanup;
-	} else if (sdebug_verbose && (ret < (a_num * lb_size))) {
-		sdev_printk(KERN_INFO, scp->device,
-			    "%s: %s: cdb indicated=%u, IO sent=%d bytes\n",
-			    my_name, __func__, a_num * lb_size, ret);
-	}
 	if (is_bytchk3) {
+		ret = do_dout_fetch(scp, a_num, arr);
+		if (ret == -1) {
+			ret = DID_ERROR << 16;
+			goto cleanup;
+		} else if (sdebug_verbose && (ret < (a_num * lb_size))) {
+			sdev_printk(KERN_INFO, scp->device,
+				    "%s: %s: cdb indicated=%u, IO sent=%d bytes\n",
+				    my_name, __func__, a_num * lb_size, ret);
+		}
 		for (j = 1, off = lb_size; j < vnum; ++j, off += lb_size)
 			memcpy(arr + off, arr, lb_size);
 	}
 	ret = 0;
-	if (!comp_write_worker(sip, lba, vnum, arr, true)) {
+	if (!comp_write_worker(sip, lba, vnum, arr, scp, NULL)) {
 		mk_sense_buffer(scp, MISCOMPARE, MISCOMPARE_VERIFY_ASC, 0);
 		ret = check_condition_result;
 		goto cleanup;
@@ -4831,9 +4944,16 @@ static void zbc_rwp_zone(struct sdebug_dev_info *devip,
 	if (zsp->z_cond == ZC4_CLOSED)
 		devip->nr_closed--;
 
-	if (zsp->z_wp > zsp->z_start)
-		memset(sip->storep + zsp->z_start * sdebug_sector_size, 0,
-		       (zsp->z_wp - zsp->z_start) * sdebug_sector_size);
+	if (zsp->z_wp > zsp->z_start) {
+		u32 lb_size = sdebug_sector_size;
+		u64 lba_start = zsp->z_start * lb_size;
+		u64 sgl_i = lba_start >> sip->elem_pow2;
+		u64 rem = lba_start - (sgl_i ? (sgl_i << sip->elem_pow2) : 0);
+		struct scatterlist *store_sgl = sip->sgl + sgl_i;
+
+		sgl_memset(store_sgl, sip->n_elem - sgl_i, rem, 0,
+			   (zsp->z_wp - zsp->z_start) * lb_size);
+	}
 
 	zsp->z_non_seq_resource = false;
 	zsp->z_wp = zsp->z_start;
@@ -6051,15 +6171,15 @@ static int scsi_debug_show_info(struct seq_file *m, struct Scsi_Host *host)
 				   sdhp->shost->host_no, idx);
 			++j;
 		}
-		seq_printf(m, "\nper_store array [most_recent_idx=%d]:\n",
+		seq_printf(m, "\nper_store array [most_recent_idx=%d] sgl_s:\n",
 			   sdeb_most_recent_idx);
 		j = 0;
 		xa_for_each(per_store_ap, l_idx, sip) {
 			niu = xa_get_mark(per_store_ap, l_idx,
 					  SDEB_XA_NOT_IN_USE);
 			idx = (int)l_idx;
-			seq_printf(m, "  %d: idx=%d%s\n", j, idx,
-				   (niu ? "  not_in_use" : ""));
+			seq_printf(m, "  %d: idx=%d%s, n_elems=%u, elem_sz=%u\n", j, idx,
+				   (niu ? "  not_in_use" : ""), sip->n_elem, 1 << sip->elem_pow2);
 			++j;
 		}
 	}
@@ -7178,7 +7298,8 @@ static void sdebug_erase_store(int idx, struct sdeb_store_info *sip)
 	}
 	vfree(sip->map_storep);
 	vfree(sip->dif_storep);
-	vfree(sip->storep);
+	if (sip->sgl)
+		sgl_free_n_order(sip->sgl, sip->n_elem, sip->order);
 	xa_erase(per_store_ap, idx);
 	kfree(sip);
 }
@@ -7199,6 +7320,41 @@ static void sdebug_erase_all_stores(bool apart_from_first)
 		sdeb_most_recent_idx = sdeb_first_idx;
 }
 
+/* Want uniform sg element size, the last one can be less. */
+static int sdeb_store_sgat(struct sdeb_store_info *sip, int sz_mib)
+{
+	unsigned int order;
+	unsigned long sz_b = (unsigned long)sz_mib * 1048576;
+	gfp_t mask_ap = GFP_KERNEL | __GFP_COMP | __GFP_NOWARN | __GFP_ZERO;
+
+	if (sz_mib <= 128)
+		order = get_order(max_t(unsigned int, PAGE_SIZE, 32 * 1024));
+	else if (sz_mib <= 256)
+		order = get_order(max_t(unsigned int, PAGE_SIZE, 64 * 1024));
+	else if (sz_mib <= 512)
+		order = get_order(max_t(unsigned int, PAGE_SIZE, 128 * 1024));
+	else if (sz_mib <= 1024)
+		order = get_order(max_t(unsigned int, PAGE_SIZE, 256 * 1024));
+	else if (sz_mib <= 2048)
+		order = get_order(max_t(unsigned int, PAGE_SIZE, 512 * 1024));
+	else
+		order = get_order(max_t(unsigned int, PAGE_SIZE, 1024 * 1024));
+	sip->sgl = sgl_alloc_order(sz_b, order, false, mask_ap, &sip->n_elem);
+	if (!sip->sgl && order > 0) {
+		sip->sgl = sgl_alloc_order(sz_b, --order, false, mask_ap, &sip->n_elem);
+		if (!sip->sgl && order > 0)
+			sip->sgl = sgl_alloc_order(sz_b, --order, false, mask_ap, &sip->n_elem);
+	}
+	if (!sip->sgl) {
+		pr_info("%s: unable to obtain %d MiB, last element size: %u kiB\n", __func__,
+			sz_mib, (1 << (PAGE_SHIFT + order)) / 1024);
+		return -ENOMEM;
+	}
+	sip->order = order;
+	sip->elem_pow2 = PAGE_SHIFT + order;
+	return 0;
+}
+
 /*
  * Returns store xarray new element index (idx) if >=0 else negated errno.
  * Limit the number of stores to 65536.
@@ -7230,13 +7386,21 @@ static int sdebug_add_store(void)
 	xa_unlock_irqrestore(per_store_ap, iflags);
 
 	res = -ENOMEM;
-	sip->storep = vzalloc(sz);
-	if (!sip->storep) {
-		pr_err("user data oom\n");
+	res = sdeb_store_sgat(sip, sdebug_dev_size_mb);
+	if (res) {
+		pr_err("sgat: user data oom\n");
 		goto err;
 	}
-	if (sdebug_num_parts > 0)
-		sdebug_build_parts(sip->storep, sz);
+	if (sdebug_num_parts > 0) {
+		const int a_len = 1024;
+		u8 *arr = kzalloc(a_len, GFP_KERNEL);
+
+		if (arr) {
+			sdebug_build_parts(arr, sz);
+			sg_copy_from_buffer(sip->sgl, sip->n_elem, arr, a_len);
+			kfree(arr);
+		}
+	}
 
 	/* DIF/DIX: what T10 calls Protection Information (PI) */
 	if (sdebug_dix) {
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/5] sgl_alloc_order: remove 4 GiB limit
  2022-11-12 19:49 ` [PATCH v2 1/5] sgl_alloc_order: remove 4 GiB limit Douglas Gilbert
@ 2022-11-15 20:33   ` Jason Gunthorpe
  2022-11-16  0:20     ` Douglas Gilbert
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2022-11-15 20:33 UTC (permalink / raw)
  To: Douglas Gilbert
  Cc: linux-scsi, martin.petersen, jejb, hare, bvanassche, bostroesser

On Sat, Nov 12, 2022 at 02:49:35PM -0500, Douglas Gilbert wrote:
> This patch fixes a check done by sgl_alloc_order() before it starts
> any allocations. The comment in the original said: "Check for integer
> overflow" but the right hand side of the expression in the condition
> is resolved as u32 so it can not exceed UINT32_MAX (4 GiB) which
> means 'length' can not exceed that value.
> 
> This function may be used to replace vmalloc(unsigned long) for a
> large allocation (e.g. a ramdisk). vmalloc has no limit at 4 GiB so
> it seems unreasonable that sgl_alloc_order() whose length type is
> unsigned long long should be limited to 4 GB.
> 
> Solutions to this issue were discussed by Jason Gunthorpe
> <jgg@ziepe.ca> and Bodo Stroesser <bostroesser@gmail.com>. This
> version is base on a linux-scsi post by Jason titled: "Re:
> [PATCH v7 1/4] sgl_alloc_order: remove 4 GiB limit" dated 20220201.
> 
> An earlier patch fixed a memory leak in sg_alloc_order() due to the
> misuse of sgl_free(). Take the opportunity to put a one line comment
> above sgl_free()'s declaration warning that it is not suitable when
> order > 0 .
> 
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Bodo Stroesser <bostroesser@gmail.com>
> Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
> ---
>  include/linux/scatterlist.h |  1 +
>  lib/scatterlist.c           | 23 ++++++++++++++---------
>  2 files changed, 15 insertions(+), 9 deletions(-)

I still prefer the version I posted here:

https://lore.kernel.org/linux-scsi/Y1aDQznakNaWD8kd@ziepe.ca/

Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/5] sgl_alloc_order: remove 4 GiB limit
  2022-11-15 20:33   ` Jason Gunthorpe
@ 2022-11-16  0:20     ` Douglas Gilbert
  2022-11-16  0:39       ` Jason Gunthorpe
  0 siblings, 1 reply; 10+ messages in thread
From: Douglas Gilbert @ 2022-11-16  0:20 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-scsi, martin.petersen, jejb, hare, bvanassche, bostroesser

On 2022-11-15 15:33, Jason Gunthorpe wrote:
> On Sat, Nov 12, 2022 at 02:49:35PM -0500, Douglas Gilbert wrote:
>> This patch fixes a check done by sgl_alloc_order() before it starts
>> any allocations. The comment in the original said: "Check for integer
>> overflow" but the right hand side of the expression in the condition
>> is resolved as u32 so it can not exceed UINT32_MAX (4 GiB) which
>> means 'length' can not exceed that value.
>>
>> This function may be used to replace vmalloc(unsigned long) for a
>> large allocation (e.g. a ramdisk). vmalloc has no limit at 4 GiB so
>> it seems unreasonable that sgl_alloc_order() whose length type is
>> unsigned long long should be limited to 4 GB.
>>
>> Solutions to this issue were discussed by Jason Gunthorpe
>> <jgg@ziepe.ca> and Bodo Stroesser <bostroesser@gmail.com>. This
>> version is base on a linux-scsi post by Jason titled: "Re:
>> [PATCH v7 1/4] sgl_alloc_order: remove 4 GiB limit" dated 20220201.
>>
>> An earlier patch fixed a memory leak in sg_alloc_order() due to the
>> misuse of sgl_free(). Take the opportunity to put a one line comment
>> above sgl_free()'s declaration warning that it is not suitable when
>> order > 0 .
>>
>> Cc: Jason Gunthorpe <jgg@ziepe.ca>
>> Cc: Bodo Stroesser <bostroesser@gmail.com>
>> Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
>> ---
>>   include/linux/scatterlist.h |  1 +
>>   lib/scatterlist.c           | 23 ++++++++++++++---------
>>   2 files changed, 15 insertions(+), 9 deletions(-)
> 
> I still prefer the version I posted here:
> 
> https://lore.kernel.org/linux-scsi/Y1aDQznakNaWD8kd@ziepe.ca/

Three reasons that I don't:
   1) making the first argument of type size_t may constrict the size
      that can be allocated on a 32 bit machine (faint recollection of
      extended/expanded memory on 8086). uint64_t would be better
      than unsigned long long but see point 3)
   2) making the last (fifth) argument of type size_t is overkill on a
      64 bit machine. IMO 32 bits is sufficient. The maximum unsigned int
      is 2^32 - 1 and with a typical PAGE_SIZE of 4096 bytes and order 0,
      that is roughly 2^44 bytes or about 16 TB. If part of the kernel
      did want 16 TB in a single allocation, I hope it would choose a
      larger value for order. So then the maximum single allocation
      would be 2^(44+MAX_ORDER-1) bytes. Can I stop now?
   3) it changes the signature of an existing exported kernel function
      requiring changes in several call sites. Changing an output pointer
      type may require more than a one line change at the existing call
      sites. Due to the fact that this patch is removing an existing
      4 GB limit, those call sites have zero need for this. If I was
      maintaining the driver containing those call sites, I would be
      a bit peeved. [That said, maintaining out-of-tree patchsets, while
      trying to get them accepted in the mainline, is a considerable
      pain due to the constant changes in the block layer API.]

Doug Gilbert

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/5] sgl_alloc_order: remove 4 GiB limit
  2022-11-16  0:20     ` Douglas Gilbert
@ 2022-11-16  0:39       ` Jason Gunthorpe
  0 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2022-11-16  0:39 UTC (permalink / raw)
  To: Douglas Gilbert
  Cc: linux-scsi, martin.petersen, jejb, hare, bvanassche, bostroesser

On Tue, Nov 15, 2022 at 07:20:13PM -0500, Douglas Gilbert wrote:
> On 2022-11-15 15:33, Jason Gunthorpe wrote:
> > On Sat, Nov 12, 2022 at 02:49:35PM -0500, Douglas Gilbert wrote:
> > > This patch fixes a check done by sgl_alloc_order() before it starts
> > > any allocations. The comment in the original said: "Check for integer
> > > overflow" but the right hand side of the expression in the condition
> > > is resolved as u32 so it can not exceed UINT32_MAX (4 GiB) which
> > > means 'length' can not exceed that value.
> > > 
> > > This function may be used to replace vmalloc(unsigned long) for a
> > > large allocation (e.g. a ramdisk). vmalloc has no limit at 4 GiB so
> > > it seems unreasonable that sgl_alloc_order() whose length type is
> > > unsigned long long should be limited to 4 GB.
> > > 
> > > Solutions to this issue were discussed by Jason Gunthorpe
> > > <jgg@ziepe.ca> and Bodo Stroesser <bostroesser@gmail.com>. This
> > > version is base on a linux-scsi post by Jason titled: "Re:
> > > [PATCH v7 1/4] sgl_alloc_order: remove 4 GiB limit" dated 20220201.
> > > 
> > > An earlier patch fixed a memory leak in sg_alloc_order() due to the
> > > misuse of sgl_free(). Take the opportunity to put a one line comment
> > > above sgl_free()'s declaration warning that it is not suitable when
> > > order > 0 .
> > > 
> > > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > > Cc: Bodo Stroesser <bostroesser@gmail.com>
> > > Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
> > > ---
> > >   include/linux/scatterlist.h |  1 +
> > >   lib/scatterlist.c           | 23 ++++++++++++++---------
> > >   2 files changed, 15 insertions(+), 9 deletions(-)
> > 
> > I still prefer the version I posted here:
> > 
> > https://lore.kernel.org/linux-scsi/Y1aDQznakNaWD8kd@ziepe.ca/
> 
> Three reasons that I don't:
>   1) making the first argument of type size_t may constrict the size
>      that can be allocated on a 32 bit machine (faint recollection of
>      extended/expanded memory on 8086). uint64_t would be better
>      than unsigned long long but see point 3)

32 bit machines can't kmap more than size_t - so this is not
correct. We can't put sgl tables into highmem.

>   2) making the last (fifth) argument of type size_t is overkill on a
>      64 bit machine. IMO 32 bits is sufficient. 

IIRC, I changed it to obviously avoid integer promotion/truncation
issues. It is better to handle those with correct typing than
introducing a bunch of frail checks. We don't need to worry about the
extra 32 bits in something like this.

>   3) it changes the signature of an existing exported kernel function
>      requiring changes in several call sites. 

So fix them. It is why we have one git tree. You'll get sympathy if it
is more than 5-10 :)

>      type may require more than a one line change at the existing call
>      sites. Due to the fact that this patch is removing an existing
>      4 GB limit, those call sites have zero need for this. If I was
>      maintaining the driver containing those call sites, I would be
>      a bit peeved.

Uh, if someone is "peeved" they are not understanding how kernel APIs
are expected to evolve, I think.

It should be two patches, one to correct the types in the function
signature, and another to resolve the 4G problem.

>     [That said, maintaining out-of-tree patchsets, while
>      trying to get them accepted in the mainline, is a considerable
>      pain due to the constant changes in the block layer API.]

Which is consistent with how the community views in-kernel APIs.

Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/5] scatterlist: add sgl_copy_sgl() function
  2022-11-12 19:49 ` [PATCH v2 2/5] scatterlist: add sgl_copy_sgl() function Douglas Gilbert
@ 2022-11-16  5:59   ` Christoph Hellwig
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2022-11-16  5:59 UTC (permalink / raw)
  To: Douglas Gilbert
  Cc: linux-scsi, martin.petersen, jejb, hare, bvanassche, bostroesser, jgg

On Sat, Nov 12, 2022 at 02:49:36PM -0500, Douglas Gilbert wrote:
> Both the SCSI and NVMe subsystems receive user data from the block
> layer in scatterlist_s

No, they don't.  For one thing there is no 'scatterlist_s', and
second no one receives it.  Block drivers need to generate it
using the blk_rq_map_sg helper.

> (aka scatter gather lists (sgl) which are
> often arrays). If drivers in those subsystems represent storage
> (e.g. a ramdisk) or cache "hot" user data then they may also
> choose to use scatterlist_s. Currently there are no sgl to sgl
> operations in the kernel. Start with a sgl to sgl copy. Stops
> when the first of the number of requested bytes to copy, or the
> source sgl, or the destination sgl is exhausted. So the
> destination sgl will _not_ grow.

No, the scatterlist is a bad data structure, but for now we have
to use it for dma-mapping non-contigous pieces of memory.  For
everything else it absolutely should not be used, and we should
not add helpers to facilitate that.

NAK for this and the other helpers.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-11-16  5:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-12 19:49 [PATCH v2 0/5] scatterlist: add operations for scsi_debug Douglas Gilbert
2022-11-12 19:49 ` [PATCH v2 1/5] sgl_alloc_order: remove 4 GiB limit Douglas Gilbert
2022-11-15 20:33   ` Jason Gunthorpe
2022-11-16  0:20     ` Douglas Gilbert
2022-11-16  0:39       ` Jason Gunthorpe
2022-11-12 19:49 ` [PATCH v2 2/5] scatterlist: add sgl_copy_sgl() function Douglas Gilbert
2022-11-16  5:59   ` Christoph Hellwig
2022-11-12 19:49 ` [PATCH v2 3/5] scatterlist: add sgl_equal_sgl() function Douglas Gilbert
2022-11-12 19:49 ` [PATCH v2 4/5] scatterlist: add sgl_memset() Douglas Gilbert
2022-11-12 19:49 ` [PATCH v2 5/5] scsi_debug: change store from vmalloc to sgl Douglas Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.